Scaling Writes

scaling_writes

Most of the applications using Neo4j are read heavy and scale by getting more powerful servers or adding additional instances to the HA cluster. Writes however can be a little bit tricker. Before embarking on any of the following strategies it is best that the server is tuned. See the Linux Performance Guide for details. One strategy we’ve seen already is splitting the reads and writes to the cluster, so the writes only go to the Master. The brave can even change the push factor to zero and set a pull interval only in neo4j/conf/neo4j.properties:

ha.tx_push_factor = 0
ha.pull_interval = 5s

By changing the default of 1 to 0, the slaves will only be updated when they pull from the master every 5 seconds.

Another strategy is to accumulate writes and write periodically. Let’s take a look at this more closely. I’m going to build a very simple performance test suite that points to a ruby application that will send requests to Neo4j. I’ll be using Gatling, which you may remember from last Valentine’s day. We’re going to create two tests to start out with. One will make a POST request to http://localhost:9292/node which will create one node at a time, and the other will send a POST request to http://localhost:9292/nodes which will accumulate them first and then write.

class CreateNode extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:9292")

  val scn = scenario("Create Nodes")
    .repeat(5000) {
    exec(
      http("create node")
        .post("/node")
        .check(status.is(200)))
      .pause(0 milliseconds, 1 milliseconds)
  }

  setUp(
    scn.users(10).protocolConfig(httpConf)
  )
}

I’ll skip the nodes code, but its almost identical. The ruby application that listens for these requests looks like:

  post '/node' do
    $neo.create_node
    'node created'
  end  
  post '/nodes' do
    $queue << [:create_node, {}]
      if $queue.size >= 100
        $neo.batch *$queue
        $queue = []
      end
    'nodes created'
  end 

The first takes the request and immediately sends it to Neo4j. The second accumulates the writes into a queue and once that queue fills up to 100 it writes the requests in a single BATCH transaction. One of the beauties of the BATCH rest endpoint is that you can send it nodes to be created, relationships to be updated, cypher queries, whatever you want.

Let’s take a look at the performance numbers from Gatling. First one node at a time:

node_writes

Our mean is 20ms and we are doing 460 requests per second. Next 100 nodes at a time:

nodes_writes

We can see our mean latency decreased by 3x to 6ms and our requests per second increased by 3x to 1436. That’s pretty significant. Ok, what if we commit every 5000 requests instead?

large_nodes_writes

We are able to get another 10% in requests per second, but our Max response time jumped quite significantly. If we think about our application, this means most users will get fast response times, and one user every 5000 requests will sit there hating life.

accumulated_writes

So let’s take a look at another way to handle this. We’re going to completely decouple our application from writing to Neo4j, and instead write to…

rabbitmq

When we receive the request to make a new node, we’ll publish it to a RabbitMQ Exchange to be handled later.

  post '/evented_nodes' do
    message = [:create_node, {}]
    $exchange.publish(message.to_msgpack)
    'node created'
  end  

We can even reuse our accumulated strategy here:

  post '/evented_accumulated_nodes' do
    $queue << [:create_node, {}]
      if $queue.size >= 100
        $exchange.publish($queue.to_msgpack)
        $queue = []
      end
    'nodes created'
  end 

A service is subscribed to the queue of the exchange and grabs these messages.

queue.bind(exchange).subscribe(:ack => true, :block => true) do |delivery_info, metadata, payload|
  message = MessagePack.unpack(payload)
  $last = delivery_info.delivery_tag
  $messages << message
  $cnt += 1
    
  if $cnt >= MAX_BUFFER
    WriterConsumer.process_messages
  end
end

…and a consumer processes them:

  def self.process_messages
    if !$messages.empty? || self.times_up?
      batch_results = $neo.batch *$messages 
      
      # Acknowledge message
      $channel.acknowledge($last, true)
      self.reset_variables
    end
  end

I took a screen capture of RabbitMQ hard at work, processing about 1100 messages per second:

rabbitmq_queues

So what does the performance look like on these? First the single evented node test:

evented_nodes

The mean latency is 8ms, and the max latency is 40ms, both of which look great, but our requests per second went down to 1036. How about the accumulated evented node test:

evented_accumulated_writes

Now we’re cooking. Our mean and max latencies are very small and our requests per second jumped to 1747.

If you take the time to read the ruby code you may notice that I’m accumulating writes in the writer service as well, but besides waiting until I have a certain number of messages, I also have a timer that is triggering the writes. You can use the same idea in your web app to commit every x writes or every y time to handle bursts of writes as well as slow periods.

#setup timers
$timers.every(TIME_TO_WAIT) { WriterConsumer.process_messages }

# Start Timers
timer_thread = Thread.new do
  loop do
    loop { $timers.wait }
  end
end
timer_thread.abort_on_exception = true

Finding the right latency and throughput numbers for your application is important, so experiment with what make sense to you. Also, make sure you run tests on the hardware you will be running in production. Your laptop numbers will be completely different. The implementation of the accumulated writes technique I am using will not survive a web server crash, so in your production application consider using a durable form of storage like Redis, Riak or Hazelcast instead of an in memory Ruby array.

Tagged , , , , ,

2 thoughts on “Scaling Writes

  1. […] while ago, I showed you a way to scale Neo4j writes using RabbitMQ. Which was kinda cool, but some of you asked me for a different solution that […]

  2. […] our performance. So what can we do about it? If you are a regular reader, you already know. Put a queue in front of Neo4j or put a queue inside Neo4j in order to batch your writes. Let’s see what that would look […]

Leave a comment