Scaling Writes


Most of the applications using Neo4j are read heavy and scale by getting more powerful servers or adding additional instances to the HA cluster. Writes however can be a little bit tricker. Before embarking on any of the following strategies it is best that the server is tuned. See the Linux Performance Guide for details. One strategy we’ve seen already is splitting the reads and writes to the cluster, so the writes only go to the Master. The brave can even change the push factor to zero and set a pull interval only in neo4j/conf/

ha.tx_push_factor = 0
ha.pull_interval = 5s

By changing the default of 1 to 0, the slaves will only be updated when they pull from the master every 5 seconds.

Another strategy is to accumulate writes and write periodically. Let’s take a look at this more closely. I’m going to build a very simple performance test suite that points to a ruby application that will send requests to Neo4j. I’ll be using Gatling, which you may remember from last Valentine’s day. We’re going to create two tests to start out with. One will make a POST request to http://localhost:9292/node which will create one node at a time, and the other will send a POST request to http://localhost:9292/nodes which will accumulate them first and then write.

class CreateNode extends Simulation {
  val httpConf = httpConfig

  val scn = scenario("Create Nodes")
    .repeat(5000) {
      http("create node")
      .pause(0 milliseconds, 1 milliseconds)


I’ll skip the nodes code, but its almost identical. The ruby application that listens for these requests looks like:

  post '/node' do
    'node created'
  post '/nodes' do
    $queue << [:create_node, {}]
      if $queue.size >= 100
        $neo.batch *$queue
        $queue = []
    'nodes created'

The first takes the request and immediately sends it to Neo4j. The second accumulates the writes into a queue and once that queue fills up to 100 it writes the requests in a single BATCH transaction. One of the beauties of the BATCH rest endpoint is that you can send it nodes to be created, relationships to be updated, cypher queries, whatever you want.

Let’s take a look at the performance numbers from Gatling. First one node at a time:


Our mean is 20ms and we are doing 460 requests per second. Next 100 nodes at a time:


We can see our mean latency decreased by 3x to 6ms and our requests per second increased by 3x to 1436. That’s pretty significant. Ok, what if we commit every 5000 requests instead?


We are able to get another 10% in requests per second, but our Max response time jumped quite significantly. If we think about our application, this means most users will get fast response times, and one user every 5000 requests will sit there hating life.


So let’s take a look at another way to handle this. We’re going to completely decouple our application from writing to Neo4j, and instead write to…


When we receive the request to make a new node, we’ll publish it to a RabbitMQ Exchange to be handled later.

  post '/evented_nodes' do
    message = [:create_node, {}]
    'node created'

We can even reuse our accumulated strategy here:

  post '/evented_accumulated_nodes' do
    $queue << [:create_node, {}]
      if $queue.size >= 100
        $queue = []
    'nodes created'

A service is subscribed to the queue of the exchange and grabs these messages.

queue.bind(exchange).subscribe(:ack => true, :block => true) do |delivery_info, metadata, payload|
  message = MessagePack.unpack(payload)
  $last = delivery_info.delivery_tag
  $messages << message
  $cnt += 1
  if $cnt >= MAX_BUFFER

…and a consumer processes them:

  def self.process_messages
    if !$messages.empty? || self.times_up?
      batch_results = $neo.batch *$messages 
      # Acknowledge message
      $channel.acknowledge($last, true)

I took a screen capture of RabbitMQ hard at work, processing about 1100 messages per second:


So what does the performance look like on these? First the single evented node test:


The mean latency is 8ms, and the max latency is 40ms, both of which look great, but our requests per second went down to 1036. How about the accumulated evented node test:


Now we’re cooking. Our mean and max latencies are very small and our requests per second jumped to 1747.

If you take the time to read the ruby code you may notice that I’m accumulating writes in the writer service as well, but besides waiting until I have a certain number of messages, I also have a timer that is triggering the writes. You can use the same idea in your web app to commit every x writes or every y time to handle bursts of writes as well as slow periods.

#setup timers
$timers.every(TIME_TO_WAIT) { WriterConsumer.process_messages }

# Start Timers
timer_thread = do
  loop do
    loop { $timers.wait }
timer_thread.abort_on_exception = true

Finding the right latency and throughput numbers for your application is important, so experiment with what make sense to you. Also, make sure you run tests on the hardware you will be running in production. Your laptop numbers will be completely different. The implementation of the accumulated writes technique I am using will not survive a web server crash, so in your production application consider using a durable form of storage like Redis, Riak or Hazelcast instead of an in memory Ruby array.

Tagged , , , , ,

One thought on “Scaling Writes

  1. […] while ago, I showed you a way to scale Neo4j writes using RabbitMQ. Which was kinda cool, but some of you asked me for a different solution that […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 2,146 other followers

%d bloggers like this: