Let’s build something Outrageous – Part 7: Performance

I don’t know why, but I need things to be fast. Nothing drives me up the wall more than waiting a long time for some database query to finish. It’s some kind of disease I tell you. So today we’re doing to do a little performance checking to see where RageDB is at. So far all we can do is create nodes and relationships, but that’s enough to get us started.

My favorite http performance testing tool in the whole world is Gatling. I don’t even know Scala, but that doesn’t slow me down writing performance tests since the API just vibes with me. I used the maven archetype and put a little project together. You can check it out on github. I like to start out with a set of Test Parameters. These have some sane defaults but you can override them on the environment or command line when running the benchmarks.

class TestParameters {
  def getProperty(propertyName: String, defaultValue: String): String = {
    Option(System.getenv(propertyName))
      .orElse(Option(System.getProperty(propertyName)))
      .getOrElse(defaultValue)
  }

  def rageURL: String = getProperty("RAGE_URL", "http://localhost:7243")
  def rageDB: String = getProperty("RAGE_DB", "rage")
  def userCount: Int = getProperty("USERS", "32").toInt
  def testDuration: Int = getProperty("DURATION", "60").toInt
  def fromId: Int = getProperty("FROM_ID", "0").toInt
  def toId: Int = getProperty("TO_ID", "3000000").toInt
}

The tests will run for 60 seconds and run 32 threads which by default on Gatling open up 6x as many connections for a total of 192 requests at the same time. The default server URL is localhost, but we won’t test against local that’s always a bad idea. Instead we’ll setup a C5.2xLarge instance to be our load generator and run RageDB on an R5.2xLarge configured to run without hyper-threading so we get a proper 4 cores on the same availability zone so networking isn’t an issue. The 0 to 3 million will come in to play later.

For our first test, we will create a bunch of nodes without any properties. Just empty nodes with a key and label but that’s it.

class CreateNodeUniformNoProperties extends Simulation {
...
  val id = new AtomicLong(0)
  val incrementalFeeder: Iterator[Map[String, Long]] = 
        Iterator.continually(Map("uuid" -> id.getAndIncrement()))

...

  def request: HttpRequestBuilder = {
    http("CreateNodeUniformNoProperties")
      .post("/db/" + params.rageDB + "/node/Node/${uuid}")
      .check(status.is(201))
  }

  val scn: ScenarioBuilder = scenario("rage.CreateNodeUniformNoProperties")
    .during(params.testDuration) {
      feed(incrementalFeeder)
        .exec(request)
    }
...

The test will start creating a node with the label “Node” (yes, I know I’m super creative) and key “0” and increment the key every time. Here are the results after a minute:

---- Global Information --------------------------------------------------------
> request count                                    6334492 (OK=6334492 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                     88 (OK=88     KO=-     )
> mean response time                                     0 (OK=0      KO=-     )
> std deviation                                          1 (OK=1      KO=-     )
> response time 50th percentile                          0 (OK=0      KO=-     )
> response time 75th percentile                          0 (OK=0      KO=-     )
> response time 95th percentile                          1 (OK=1      KO=-     )
> response time 99th percentile                          1 (OK=1      KO=-     )
> mean requests/sec                                103844.131 (OK=103844.131 KO=-     )

That’s 6.3 million empty nodes it created at a rate of about 100k per second with a mean latency of less than 1 millisecond and a max of 88ms. That’s not bad at all. Let’s see how fast we can retrieve them back.

class GetNodeUniform extends Simulation {
...
def request: HttpRequestBuilder = {
    http("GetNodeUniform")
      .get("/db/" + params.rageDB + "/node/Node/${uuid}")
      .check(status.is(200))
  }

This test gets the nodes by label and key. Let’s give it a whirl:

---- Global Information --------------------------------------------------------
> request count                                    6289562 (OK=6289562 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                     68 (OK=68     KO=-     )
> mean response time                                     0 (OK=0      KO=-     )
> std deviation                                          0 (OK=0      KO=-     )
> response time 50th percentile                          0 (OK=0      KO=-     )
> response time 75th percentile                          0 (OK=0      KO=-     )
> response time 95th percentile                          1 (OK=1      KO=-     )
> response time 99th percentile                          1 (OK=1      KO=-     )
> mean requests/sec                                103107.574 (OK=103107.574 KO=-     )

Almost the same speed. 100k requests per second with a mean latency under 1ms and a max of 68ms. Alright, let’s make some relationships:

class CreateRelationshipsForUniformNodesNoProperties extends Simulation {
...
  val start: Int = params.fromId
  val end: Int = params.toId
  val incrementalFeeder: Iterator[Map[String, Long]] = Iterator.continually(Map(
    "uuid_1" -> Random.between(start, end),
    "uuid_2" -> Random.between(start, end),
    "rel_type_id" -> Random.between(1, 10)
  ))
...
  def request: HttpRequestBuilder = {
    http("CreateRelationshipsForUniformNodesNoProperties")
      .post("/db/" + params.rageDB + 
            "/node/Node/${uuid_1}/relationship/Node/${uuid_2}/TYPE_${rel_type_id}")
      .check(status.is(201))
  }

We’ll create as many relationships as we can for a minute between random nodes from 0 to 3m of one of 10 random relationship types between TYPE_1 and TYPE_10.

---- Global Information --------------------------------------------------------
> request count                                    5525924 (OK=5525924 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                     64 (OK=64     KO=-     )
> mean response time                                     0 (OK=0      KO=-     )
> std deviation                                          1 (OK=1      KO=-     )
> response time 50th percentile                          0 (OK=0      KO=-     )
> response time 75th percentile                          1 (OK=1      KO=-     )
> response time 95th percentile                          1 (OK=1      KO=-     )
> response time 99th percentile                          1 (OK=1      KO=-     )
> mean requests/sec                                90588.918 (OK=90588.918 KO=-     )

We’re at 90k relationships created per second for about 5.5 million relationships. Remember that we’re on a 4 core server, so most of the requests will need two cores to work together to create the relationship and relationship chain. Alright now let’s get these relationships back:

class GetRelationshipsFromNodeUniform extends Simulation {
...
  val start = 0
  val end   = 3000000
  val incrementalFeeder: Iterator[Map[String, Long]] = 
         Iterator.continually(Map("uuid" -> Random.between(start, end)))
...
  def request: HttpRequestBuilder = {
    http("GetRelationshipsFromNodeUniform")
      .get("/db/" + params.rageDB + "/node/Node/${uuid}/relationships")
      .check(status.is(200))
  }

We’re not going to get the relationships directly. Instead we’re going to use the Label and Key to get to a node from 0 to 3m and ask for its relationships. Both the incoming and outgoing of all types.

---- Global Information --------------------------------------------------------
> request count                                    4822170 (OK=4822170 KO=0     )
> min response time                                      0 (OK=0      KO=-     )
> max response time                                     67 (OK=67     KO=-     )
> mean response time                                     0 (OK=0      KO=-     )
> std deviation                                          1 (OK=1      KO=-     )
> response time 50th percentile                          0 (OK=0      KO=-     )
> response time 75th percentile                          1 (OK=1      KO=-     )
> response time 95th percentile                          1 (OK=1      KO=-     )
> response time 99th percentile                          1 (OK=1      KO=-     )
> mean requests/sec                                79051.967 (OK=79051.967 KO=-     )

Down to about 80k requests per second. But remember we are not just returning nodes or relationships, we’re actually traversing. But you know, I think our numbers could be higher. No I don’t mean I need to go back and change the code (but we will do that eventually). I mean as much as I love Gatling, it isn’t the fastest performance testing tool out there. Let’s try to get the nodes again, but this time we will use WRK. Instead of Scala, we will use Lua to create our test script. This is what getting a random node from 0 to 5m looks like:

math.randomseed(os.time() + tonumber(tostring({}):sub(8)))

request = function()
   path = "/db/rage/node/Node/" .. math.random(5000000)
   return wrk.format(nil, path)
end

Our original numbers had us at about 100k requests per second with a max of 68ms. What about wrk?

./wrk -c 192 -t 32 -d 60s -s rage_get_node.lua --latency http://x.x.x.x:7243
Running 1m test @ http://x.x.x.x:7243
  32 threads and 192 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.00ms  310.67us   9.25ms   66.99%
    Req/Sec     5.93k   127.38     6.85k    71.12%
  Latency Distribution
     50%    1.08ms
     75%    1.23ms
     90%    1.33ms
     99%    1.54ms
  11354753 requests in 1.00m, 2.39GB read
Requests/sec: 188931.40
Transfer/sec:     40.66MB

188k requests per second with a 99% latency of 1.54ms. That’s an 80% improvement over the Gatling version and all we did was change the performance tool. There is one problem with wrk however. Coordinated Omission. If you know what that term means, please continue on, if this is your first time hearing of this, please find an hour and 45 minutes and watch this video from Gil Tene. I promise you will learn something that will change the way you look at performance testing forever.

So what can we do, well we can switch to wrk2. It works the same way as wrk, except you get to specify the rate of requests and it tells you how well the system under load handled it. From the wrk run, we got about 190k requests per second. Let’s try that. We are going to up the time to 70 seconds since wrk2 takes 10 seconds to calibrate and we are going to add a -R flag that sets the rate at 190k:

./wrk -c 192 -t 32 -d 70s -s rage_get_node.lua --latency -R 190000 http://x.x.x.x:7243
Running 1m test @ http://x.x.x.x:7243
  32 threads and 192 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.31ms  565.03us  11.08ms   74.35%
    Req/Sec     6.26k   400.79     8.67k    73.88%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%    1.23ms
 75.000%    1.60ms
 90.000%    1.99ms
 99.000%    2.92ms
 99.900%    5.84ms
 99.990%    8.65ms
 99.999%    9.88ms
100.000%   11.09ms
  13297120 requests in 1.17m, 2.79GB read
Requests/sec: 189969.52
Transfer/sec:     40.88MB

From these numbers, RageDB can handle the load at 190k r/s and stay at a maximum of 11ms. Let’s keep going, what happens if we try to handle a sustained rate of 195k r/s?

./wrk -c 192 -t 32 -d 70s -s rage_get_node.lua --latency -R 195000 http://x.x.x.x:7243
Running 1m test @ http://x.x.x.x:7243
  32 threads and 192 connections
Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   206.04ms  325.39ms   1.40s    83.10%
    Req/Sec     6.09k    85.74     6.66k    77.55%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%   25.55ms
 75.000%  242.69ms
 90.000%  771.07ms
 99.000%    1.24s 
 99.900%    1.37s 
 99.990%    1.38s 
 99.999%    1.39s 
100.000%    1.40s   
  13587281 requests in 1.17m, 2.86GB read
Requests/sec: 194114.42
Transfer/sec:     41.77MB

We fail! Our 50% latency jumps from 1ms to 25ms and our 99% from 3ms to 1.24 seconds! So we found our true limit under this workload and hardware. If you want to play with RageDB yourself, the source repository and the Gatling test code are available. The Lua for wrk and wrk2 is pasted above.

If you want to join me on Slack, follow this invite code and pop in to say hello. That’s it for now until the next installment.

Tagged , , , ,

Leave a comment