Neo4j and Gatling sitting in a tree, Performance T-E-S-T-ing

neo4j_loves_gatling

I was introduced to the open-source performance testing tool Gatling a few months ago by Dustin Barnes and fell in love with it. It has an easy to use DSL, and even though I don’t know a lick of Scala, I was able to figure out how to use it. It creates pretty awesome graphics and takes care of a lot of work for you behind the scenes. They have great documentation and a pretty active google group where newbies and questions are welcomed.

It ships with Scala, so all you need to do is create your tests and use a command line to execute it. I’ll show you how to do a few basic things, like test that you have everything working, then we’ll create nodes and relationships, and then query those nodes.

We start things off with the import statements:

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._

Then we start right off with our simulation. For this first test, we are just going to get the root node via the REST api. We specify our Neo4j server, in this case I am testing on localhost (you’ll want to run your test code and Neo4j server on different servers when doing this for real). Next we specify that we are accepting JSON to return. For our test scenario, for a duration of 10 seconds, we’ll get “/db/data/node/0” and check that Neo4j returns the http status code 200 (for everything be ok). We’ll pause between 0 and 5 milliseconds between calls to simulate actual users, and in our setup we’ll specify that we want 100 users.

class GetRoot extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")

  val scn = scenario("Get Root")
   .during(10) {
     exec(
       http("get root node")
         .get("/db/data/node/0")
         .check(status.is(200)))
     .pause(0 milliseconds, 5 milliseconds)
   }

  setUp(
    scn.users(100).protocolConfig(httpConf)
  )
}

We’ll call this file “GetRoot.scala” and put it in the user-files/simulations/neo4j.

gatling-charts-highcharts-1.4.0/user-files/simulations/neo4j/

We can run our code with:

~$ bin/gatling.sh

We’ll get a prompt asking us which test we want to run:

GATLING_HOME is set to /Users/maxdemarzi/Projects/gatling-charts-highcharts-1.4.0
Choose a simulation number:
     [0] GetRoot
     [1] advanced.AdvancedExampleSimulation
     [2] basic.BasicExampleSimulation

Choose the number next to GetRoot and press enter.

Next you’ll get prompted for an id, or you can just go with the default by pressing enter again:

Select simulation id (default is 'getroot'). Accepted characters are a-z, A-Z, 0-9, - and _

If you want to add a description, you can:

Select run description (optional)

Finally it starts for real:

================================================================================
2013-02-14 17:18:03                                                  10s elapsed
---- Get Root ------------------------------------------------------------------
Users  : [#################################################################]100%
          waiting:0     / running:0     / done:100  
---- Requests ------------------------------------------------------------------
> get root node                                              OK=58457  KO=0     
================================================================================

Simulation finished.
Simulation successful.
Generating reports...
Reports generated in 0s.
Please open the following file : /Users/maxdemarzi/Projects/gatling-charts-highcharts-1.4.0/results/getroot-20130214171753/index.html

The progress bar is a measure of the total number of users who have completed their task, not a measure of the simulation that is done, so don’t worry if that stays at zero for a long while and then jumps quickly to 100%. You can also see the OK (test passed) and KO (tests failed) numbers. Lastly it creates a great html based report for us. Let’s take a look:

gatling

Here you can see statistics about the response times as well as the requests per second. So that’s great, we can get the root node, but that’s not very interesting, let’s create some nodes:

class CreateNodes extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")

  val createNode = """{"query": "create me"}"""

  val scn = scenario("Create Nodes")
    .repeat(1000) {
    exec(
      http("create node")
        .post("/db/data/cypher")
        .body(createNode)
        .asJSON
        .check(status.is(200)))
      .pause(0 milliseconds, 5 milliseconds)
  }


  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

In this case, we are setting 100 users to create 1000 nodes each with a ramp time of 10 seconds. We’ll run this simulation just like before, but choose Create Nodes. Once it’s done, take a look at the report, and scroll down a bit to see the chart of the Number of Requests per Second:

Screen Shot 2013-02-14 at 5.33.29 PM

You can see the number of users ramp up over the first 10 seconds and fade at the end. Let’s go ahead and connect some of these nodes together:

We’ll add JSONObject to import statements, and since I want to see what nodes we link to what nodes together, we’ll print the details for the request. I am randomly choosing two ids, and passing them to a cypher query to create the relationships:

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._
import util.parsing.json.JSONObject


class CreateRelationships extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")
    .requestInfoExtractor(request => {
      println(request.getStringData)
      Nil
    })


  val rnd = new scala.util.Random
  val chooseRandomNodes = exec((session) => {
    session.setAttribute("params", JSONObject(Map("id1" -> rnd.nextInt(100000),
                                                  "id2" -> rnd.nextInt(100000))).toString())
  })

  val createRelationship = """START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2"""
  val cypherQuery = """{"query": "%s", "params": %s }""".format(createRelationship, "${params}")


  val scn = scenario("Create Relationships")
    .during(30) {
    exec(chooseRandomNodes)
      .exec(
        http("create relationships")
          .post("/db/data/cypher")
          .header("X-Stream", "true")
          .body(cypherQuery)
          .asJSON
          .check(status.is(200)))
      .pause(0 milliseconds, 5 milliseconds)
  }

  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

When you run this, you’ll see a stream of the parameters we sent to our post request:

{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 98468, "id2" : 20147} }
{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 83557, "id2" : 26633} }
{"query": "START node1=node({id1}), node2=node({id2}) CREATE UNIQUE node1-[:KNOWS]->node2", "params": {"id1" : 22386, "id2" : 99139} }

You can turn this off, but I just wanted to make sure the ids were random and it helps when debugging. Now we can query the graph. For this next simulation, I want to see the answers returned from Neo4j, and I want to see the nodes related to 10 random nodes passed in as a JSON array. Notice it’s a bit different from before, and we are also checking to see if we got “data” back in our request.

import com.excilys.ebi.gatling.core.Predef._
import com.excilys.ebi.gatling.http.Predef._
import akka.util.duration._
import bootstrap._
import util.parsing.json.JSONArray


class QueryGraph extends Simulation {
  val httpConf = httpConfig
    .baseURL("http://localhost:7474")
    .acceptHeader("application/json")
    .responseInfoExtractor(response => {
      println(response.getResponseBody)
      Nil
    })
    .disableResponseChunksDiscarding

  val rnd = new scala.util.Random
  val nodeRange = 1 to 100000
  val chooseRandomNodes = exec((session) => {
    session.setAttribute("node_ids", JSONArray.apply(List.fill(10)(nodeRange(rnd.nextInt(nodeRange length)))).toString())
  })

  val getNodes = """START nodes=node({ids}) MATCH nodes -[:KNOWS]-> other_nodes RETURN ID(other_nodes)"""
  val cypherQuery = """{"query": "%s", "params": {"ids": %s}}""".format(getNodes, "${node_ids}")

  val scn = scenario("Query Graph")
    .during(30) {
    exec(chooseRandomNodes)
      .exec(
        http("query graph")
          .post("/db/data/cypher")
          .header("X-Stream", "true")
          .body(cypherQuery)
          .asJSON
          .check(status.is(200))
          .check(jsonPath("data")))
      .pause(0 milliseconds, 5 milliseconds)
  }

  setUp(
    scn.users(100).ramp(10).protocolConfig(httpConf)
  )
}

If we take a look at the details tab for this simulation we see a small spike in the middle:

Screen Shot of Gatling

This is a tell-tale sign of a JVM Garbage Collection taking place and we may want to look into that. Edit your neo4j/conf/neo4j-wrapper.conf file and uncomment the garbage collection logging, as well as add timestamps to gain better visibility in to the issue:

# Uncomment the following line to enable garbage collection logging
wrapper.java.additional.4=-Xloggc:data/log/neo4j-gc.log
wrapper.java.additional.5=-XX:+PrintGCDateStamps

Neo4j performance tuning deserves its own blog post, but at least now you have a great way of testing your performance as you tweak JVM, cache, hardware, load balancing, and other parameters. Don’t forget while testing Neo4j directly is pretty cool, you can use Gatling to test your whole web application too and measure end to end performance.

Tagged , , , , , , ,

6 thoughts on “Neo4j and Gatling sitting in a tree, Performance T-E-S-T-ing

  1. […] Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING by Max De Marzi. […]

  2. […] add a couple of performance tests to the mix. We learned about Gatling in a previous blog post, we’re going to use it here again. The first test will randomly choose users and documents […]

  3. […] that will send requests to Neo4j. I’ll be using Gatling, which you may remember from last Valentine’s day. We’re going to create two tests to start out with. One will make a POST request to […]

  4. […] money, consider putting SSD drives on these servers as well. You won’t regret it. As always, test, test, test, test your configurations and find the right one for your […]

  5. […] how fast is it? I turn to the tried and true Gatling Tool and feed it a list of node ids for 20 seconds, repeating the test a few […]

  6. […] our unmanaged extension and start the server, we can write our performance test using Gatling as we’ve done before. We’ll use the transactions.csv file we created earlier as our test data, and send a JSON […]

Leave a comment