Benchmarks and Superchargers

For the most part, I hate competitive benchmarks. The vendor who publishes them always seems to come out on top regardless. The numbers are always amazing, but once you start digging in a little bit you start to see faults in what is actually being measured and it never applies to real world workloads. For example you have Cassandra claiming 1 Million writes per second on 300 servers. Then Aerospike claiming 1 Million writes per second on 50 servers. MongoDB claiming almost 32k writes per second on a single server, but claiming Cassandra can only do 6k w/s and Couch can only do 1.2k w/s on a single server… Then ScyllaDB has almost 2 Million writes per second on 3 servers blowing everybody away.

Michael Hunger wrote a test where Neo4j can write over 1 Million writes per second on a single server using Cypher. But we don’t post performance tests that don’t reflect real world graph workloads. In fact, it’s pretty hard to design real tests that don’t bias any one company or technology. An effort in this direction is being taken by the Linked Data Benchmark Council. Go take a look at their work. Also remember that nobody needs millions of writes per second on an Operational Datastore. That volume of data is best sent to your big data cluster for analytics and reporting.

Recently ArangoDB posted a series of benchmarks showing how awesome ArangoDB is and how much everybody else sucks. OrientDB contested those benchmarks with new results showing how awesome OrientDB is at caching results.

The benchmark data set is 2 Million Nodes and 30 Million Relationships. At that size, you could put it on SQLite and live happily ever after, so it’s not very realistic. Anyway, let’s take a look at some of the results… let’s begin with Shortest Path.
INFO Neo4J: shortest path, 40 items INFO Total Time for 40 requests: 400 ms

Hum… 400ms to run 40 requests for Neo4j versus 60ms for ArangoDB. Doesn’t look too good… I wonder what happens if I run it again.
INFO Neo4J: shortest path, 40 items INFO Total Time for 40 requests: 77 ms

Whoa… that dropped from 400ms to 77ms. What’s going on? According to their testing methodology they average 5 runs, restarting the server each time (because that’s totally what you would do in a production environment). What is Neo4j doing the first time vs the second time that test is run? Let’s start by taking a look at the query:

 
MATCH (s:PROFILES {_key:'P10000'}),(t:PROFILES {_key:'P10001'}),
p = shortestPath((s)-[*..15]->(t)) 
RETURN [x in nodes(p) | x._key] as path

This Cypher query needs to be converted into code… it happens in the following steps:

Convert the input query string into an abstract syntax tree (AST)
Optimize and normalize the AST
Create a query graph from the normalized AST
Create a logical plan from X
Rewrite the logical plan
Create an execution plan from the logical plan
Execute the query using the execution plan

For those of you who are more visual:

Let’s put “PROFILE” in front of that query and see what Neo4j is actually doing:

This execution plan becomes Scala code that is interpreted and run. There is a new compiled runtime that will turn that execution plan into java byte code, but it doesn’t know how to handle this type of query yet. The code for this query is cached, so the next time Neo4j sees the same query (even with different parameters) it will just run it instead of figuring it out again. This is the main reason the second time we run this test, it goes much faster. However there is another reason:

ArangoDB is written in C++, so it doesn’t have to deal with JVM warm-up. Victor Grazi explains:

A well-known feature of the Java platform is that as a Java application launches and executes, the JVM compiles the application into executable machine code. As the application continues to run, the JVM will evaluate the execution history of the application and recompile important parts of the application code to improve performance. Consequently, an application’s performance will improve over time.

In a production environment, this is not a problem since databases see the same queries all the time and this code gets optimized. These tests however don’t do that.

So what’s the right value 400ms or 77ms… I’ll leave that up to you to decide. Now, this blog post is titled “Benchmarks and Superchargers” and I haven’t talked about Superchargers at all, so what gives. Well, the one thing I absolutely love love love about Neo4j is that it is a Hot Rod.

By that I mean that it’s completely customizable. You can add API Extensions, Plugins, Kernel Extensions, your own Cypher Functions, your own modified Kernel, its completely embeddable, so pretty much what-ever-you-want.

So lets add a SuperCharger “Guacamole” extension to Neo4j to handle some of this work. I compiled the jar file, added it to the plugins directory, configured it and restarted the server. Now I modified the javascript tests and added Neo4jExt, so I can run this command:

 
nodejs benchmark neo4jext -a 10.240.0.2 -t shortest

Now instead of getting 400ms I get:

INFO Neo4JExt: shortest path, 40 items INFO Total Time for 40 requests: 120 ms

Second time:

INFO Neo4JExt: shortest path, 40 items INFO Total Time for 40 requests: 61 ms

Same speed as ArangoDB. What if I run it a few more times…

INFO Neo4JExt: shortest path, 40 items INFO Total Time for 40 requests: 55 ms

Nice… now Neo4j is 10% faster than ArangoDB.

Let’s look at the Neighbors query and what ArangoDB thinks of the subject:

Looks like a case for graph databases, but isn’t necessarily. At least Neo4j and OrientDB can’t stand out in this test – despite it’s a simple graph traversal. ArangoDB is really fast using just 464ms in AVG, no graph database comes close. That’s because lookups in a graph of a known length is faster with an Index lookup than using outbound links in every vertex.

Ah… there it is! With Just 1.6 Million users and about 30 Million relationships in this dataset an index look-up is faster than a traversal. What happens when you get to 10 Million users? 100 Million Users or even 1 Billion Users and you have 20 Billion relationships to look up in an index. Oh that’s right. A Traversal is NOW much faster than an index look-up. A B-Tree Index lookup is O(log n). Which means as your data grows by a factor of 10, your speed slows down by n for every join (10=1x,100=2x,1000=3x, etc). So as your data grows, the index keeps getting fatter and fatter, it starts to slow down and then one day, it doesn’t matter how much money you throw at Oracle RAC, it’s just not going to be able to handle it.

Relational Databases can’t scale joins. This is the whole reason the NoSQL movement got started. Key Value stores don’t have them. Document Databases replicate data by aggregating whole objects so you don’t have to join. Column Stores replicate data per query so you don’t have to join. This is why you end up with 300 server clusters of these types of databases, while Neo4j clusters tend to be in the 3-7 server range. Neo4j doesn’t have to duplicate data, and doesn’t have monster indexes to deal with.

I was able to get “Neighbors2” down from 2.3s to 1.7s and Aggregation from 2.8s to 1.3s. In the case of aggregation I’m using a non-public API (naughty me) but I haven’t gotten around to parallelizing it yet. Yeah that’s right, you can make your queries parallel if you want to (see this repo for an example of parallel pagerank for now). I haven’t actually tried it yet, but it’s also possible to distribute a particularly heavy query across a cluster. Imagine starting at a user, getting their friends and then for each friend sending friend specific requests to yourself and the rest of the cluster, then aggregating them all together.

Another thing I didn’t do is embed Neo4j in a high speed web server like Undertow to cut down some of my http overhead or switch to a socket based interface to skip it altogether. I could also cheat and add my very own “Command Cache” using Google Guava.

With Neo4j it’s always a lovely day to customize your database! So ride eternal shiny and chrome!

One thought on “Benchmarks and Superchargers”

Why Native Graph Matters: The Co-Evolution of Computer Hardware & Graph Databases says:

June 3, 2019 at 11:38 PM

[…] year or so ago, Max De Marzi did some very interesting benchmarks about native and non-native graphs. There’s something very important about the algorithmic and mechanical efficiency of native […]

Max De Marzi

Graphs, Graphs, and nothing but the Graphs