Caching Immutable Id lookups in Neo4j

GiveMeTheCache

If you’ve been following my blog for a while, you probably know I like using YourKit and Gatling for testing end to end requests in Neo4j. Today however we are going to do something a little different. We are going to be micro-benchmarking a very small piece of code within our Unmanaged Extension using a Java library called JMH.

In Neo4j 2.2 and up the Java API has been made a little easier to use. While in 2.1 you had to do this to get a node:

final Node node = IteratorUtil.singleOrNull(
      db.findNodesByLabelAndProperty(Labels.Segment, "segmentId", segmentId));

Or even worse, if you didn’t know about the IteratorUtil.singleOrNull helper:

final ResourceIterable<Node> nodes = 
       db.findNodesByLabelAndProperty(Labels.Segment, "segmentId", segmentId);

final Node node;

try (ResourceIterator<Node> nodeIter = nodes.iterator()) {
     node = (nodeIter.hasNext() ? nodeIter.next() : null);
}

Now you can simply do this:

final Node node = db.findNode(Labels.Segment, "segmentId", segmentId);

This new helper makes finding a node by a property and Label much easier. Hopefully you’ve created Unique Constraints or Indexes for all properties you’ll use to look nodes up. Now let’s see just how fast this runs. We’re going to create a method in an Unmanaged Extension that will take a list of ids and look them up. We will capture the number it found and use that for our result.

    @POST
    @Path("/lookup")
    public String Lookup(String body, @Context GraphDatabaseService db) throws IOException {
        ArrayList<Long> idsFound = new ArrayList<>();
        HashMap<String, List<Integer>> input;
        input = objectMapper.readValue(body, HashMap.class);
        List<Integer> segmentIds = input.get("segmentIds");

        try (Transaction tx = db.beginTx()) {
            for (int segmentId : segmentIds) {
                final Node node = db.findNode(Labels.Segment, "segmentId", segmentId);

                if (node != null) {
                    idsFound.add(node.getId());
                }
            }
            tx.success();
        }

        return idsFound.toString();
    }

Why am I using a list of ids instead of a single lookup? Well because I want to show you what happens as two variables change. One, the number of nodes in our index, and two, the number of lookups done in a single method.

@State(Scope.Benchmark)
public class ServiceTest {

    private Service service;
    private GraphDatabaseService db;

    @Param({"10000", "100000", "1000000"})
    public int size;

We’ll use @Param and try three different sized graphs, at 10k, 100k, and 1M nodes. We’ll use the size parameter when we create our graph:

    public void populate(GraphDatabaseService db) {
        Transaction tx = db.beginTx();
        try {
            for(int loop = 1; loop < size; loop = loop+1) {
                Node identity = db.createNode(Labels.Segment);
                identity.setProperty("segmentId", loop);

                if(loop % 1_000 == 0){
                    tx.success();
                    tx.close();
                    tx = db.beginTx();
                }
            }
            tx.success();
        } finally {
            tx.close();
        }
    }

You’ll notice I commit and start a new transaction every 1000 nodes created. I do this because we don’t to have to commit 1 million nodes all at once, it eats up too much memory. Before each Test I will setup my graph, migrate it (which creates the constraints) and populate our graph.

    @Setup
    public void prepare() {
        db = new TestGraphDatabaseFactory().newImpermanentDatabase();
        service = new Service(db);
        service.migrate(db);
        populate(db);
    }

Now we’re finally ready to create a benchmark. Our benchmark will warmup 10 times, and perform 5 measurements forking the JVM 3 times and reporting the average time it took to run.

    @Benchmark
    @Warmup(iterations = 10)
    @Measurement(iterations = 5)
    @Fork(3)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    public void measureOneLookup() throws IOException {
        service.Lookup("{\"segmentIds\":[9781]}", db);
    }

I’m using IntelliJ and running it with the JMH Plugin. After about 5 minutes we get our results:

Benchmark                      (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureOneLookup    10000  avgt   15  0.032 ± 0.000  ms/op
ServiceTest.measureOneLookup   100000  avgt   15  0.035 ± 0.001  ms/op
ServiceTest.measureOneLookup  1000000  avgt   15  0.042 ± 0.001  ms/op

So a single lookup is complete in 0.032ms to 0.42ms. Practically no time at all. It does get a little slower as our size gets bigger but that makes sense. Now let’s try measuring 10 lookups in the same method.

Benchmark                       (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureTenLookups    10000  avgt   15  0.102 ± 0.001  ms/op
ServiceTest.measureTenLookups   100000  avgt   15  0.103 ± 0.002  ms/op
ServiceTest.measureTenLookups  1000000  avgt   15  0.112 ± 0.002  ms/op

Our numbers moved up, but they are still pretty reasonable. One more time with 1000 lookups in the same method:

Benchmark                        (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measure1000Lookups    10000  avgt   15  7.781 ± 0.235  ms/op
ServiceTest.measure1000Lookups   100000  avgt   15  8.223 ± 0.166  ms/op
ServiceTest.measure1000Lookups  1000000  avgt   15  9.273 ± 0.101  ms/op

Now we are up to 9ms or so to make these 1000 lookups. In the great majority of use cases, a traversal will start from just one starting node, or a few of them. In the rare case where the traversal needs to look at data from hundreds of nodes, it may be worth the effort to add a cache. So let’s give that a shot and benchmark the same tests to compare.

    private static final LoadingCache<Integer, Long> segments = CacheBuilder.newBuilder()
            .maximumSize(1000000)
            .build(
                    new CacheLoader<Integer, Long>() {
                        public Long load(Integer segmentId) {
                            return getSegmentNodeId(segmentId);
                        }
                    });

    private static Long getSegmentNodeId(Integer segmentId){
        final Node node = db.findNode(Labels.Segment, "segmentId", segmentId);
        return node.getId();
    }

Up here we are creating a LoadingCache that will take an Integer as a Key and return a Node ID as our cached value. It is important that the cache is done this way and not by caching whole nodes. If you cache the whole node, then you have to worry about cache invalidation every time you add/remove or change a property. By just caching the Node ids, we don’t have to worry (assuming our key hold values that are immutable). So let’s take a look at our cached lookup implementation:

    @POST
    @Path("/cachedlookup")
    public String CachedLookup(String body, @Context GraphDatabaseService db) throws IOException, ExecutionException {
        ArrayList<Long> idsFound = new ArrayList<>();
        HashMap<String, List<Integer>> input;
        input = objectMapper.readValue(body, HashMap.class);
        List<Integer> segmentIds = input.get("segmentIds");

        try (Transaction tx = db.beginTx()) {
            for (int segmentId : segmentIds) {
                final Node node = db.getNodeById(segments.get(segmentId));

                if (node != null) {
                    idsFound.add(node.getId());
                }
            }
            tx.success();
        }

        return idsFound.toString();
    }

As you can see, it uses the node id to simply getNodeById and go from there. So let’s run our benchmarks again and see what we get:

Benchmark                            (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureOneCachedLookup    10000  avgt   15  0.011 ± 0.000  ms/op
ServiceTest.measureOneCachedLookup   100000  avgt   15  0.011 ± 0.000  ms/op
ServiceTest.measureOneCachedLookup  1000000  avgt   15  0.012 ± 0.000  ms/op

A single lookup here takes about half the time, but still a very small amount of time. When we bump up 10 lookups we are almost the same speed as a single lookup and roughly 10x better than using the index for lookups.

Benchmark                             (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureTenCachedLookups    10000  avgt   15  0.013 ± 0.000  ms/op
ServiceTest.measureTenCachedLookups   100000  avgt   15  0.013 ± 0.000  ms/op
ServiceTest.measureTenCachedLookups  1000000  avgt   15  0.014 ± 0.000  ms/op

Now once we move up to doing 1000 lookups we see a real difference. 9ms vs 0.3ms. That’s 30x better than our original attempt.

Benchmark                              (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measure1000CachedLookups    10000  avgt   15  0.323 ± 0.003  ms/op
ServiceTest.measure1000CachedLookups   100000  avgt   15  0.320 ± 0.004  ms/op
ServiceTest.measure1000CachedLookups  1000000  avgt   15  0.316 ± 0.003  ms/op

If you’ve seen the Neo4j Internals blog post you will immediately recognize why it’s so much faster. In the search case we are looking for a node id in a b-tree, in the cache case we are going to it almost directly. This is the same reason Neo4j is so much faster at joins than your RDBMS. It doesn’t have to lookup foreign keys in an index, it goes to them directly.

So the takeaway here is, only resort to this type of internal caching if you are performing lots of lookups in your queries and performance is critical. One last thing… if you do delete nodes, make sure to call cache.invalidate(value) just to be safe.

As always, code is available on github.

Update: Can we go faster? Yes we can. Let’s add a high performance primitives collection library to our project called Koloboke from Open HFT.

        <dependency>
            <groupId>net.openhft</groupId>
            <artifactId>koloboke-api-jdk6-7</artifactId>
            <version>0.6.6</version>
        </dependency>
        <dependency>
            <groupId>net.openhft</groupId>
            <artifactId>koloboke-impl-jdk6-7</artifactId>
            <version>0.6.6</version>
            <scope>runtime</scope>
        </dependency>

We’ll create an Integer Long Map that is mutable so we can add and remove entries as needed.

private static IntLongMap segmentMap = HashIntLongMaps.newMutableMap();

We’ll add a new method that looks just like the others except it will use “computeIfAbsent” to either get an existing Node Id for a Segment, or find it in the index. If you are using Java 7:

for (int segmentId : segmentIds) {
                final Node node = db.getNodeById(
                        segmentMap.computeIfAbsent(segmentId, new GetSegmentNodeId()));

You’ll need to create an IntToLongFunction class:

    class GetSegmentNodeId implements IntToLongFunction {
        @Override
        public long applyAsLong(int segment) {
            return  db.findNode(Labels.Segment, "segmentId", segment).getId();
        }
    }

If you are on Java 8 (make sure you use “-jdk8” in your dependencies) and you can just use a lambda:

            for (int segmentId : segmentIds) {
                final Node node = db.getNodeById(
                        segmentMap.computeIfAbsent(segmentId,
                                key -> db.findNode(Labels.Segment, "segmentId", segmentId).getId()));
            }

How much faster is it…

Benchmark                             (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureOneCachedLookup2    10000  avgt   15  0.004 ± 0.000  ms/op
ServiceTest.measureOneCachedLookup2   100000  avgt   15  0.004 ± 0.000  ms/op
ServiceTest.measureOneCachedLookup2  1000000  avgt   15  0.004 ± 0.000  ms/op

Benchmark                              (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measureTenCachedLookups2    10000  avgt   15  0.006 ± 0.000  ms/op
ServiceTest.measureTenCachedLookups2   100000  avgt   15  0.006 ± 0.000  ms/op
ServiceTest.measureTenCachedLookups2  1000000  avgt   15  0.007 ± 0.000  ms/op

Benchmark                               (size)  Mode  Cnt  Score   Error  Units
ServiceTest.measure1000CachedLookups2    10000  avgt   15  0.231 ± 0.002  ms/op
ServiceTest.measure1000CachedLookups2   100000  avgt   15  0.231 ± 0.003  ms/op
ServiceTest.measure1000CachedLookups2  1000000  avgt   15  0.231 ± 0.005  ms/op

…about twice as fast in the single and ten lookup tests, and 50% faster in the 1000 lookup test.

Another Update: This collection is not concurrent, so you’ll have to pre-load all the nodes you are going to cache before using it. If you plan on adding nodes to the cache while the system is running you’ll have to catch any ConcurrentModificationExceptions and retry, so that makes things a little messier than the Guava Loading Cache which handles concurrency for you.

Tagged , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: