Translating Cypher To Neo4j Java API 2.0

cypher-translate-2.0ish600x293

About 6 months ago we looked at how to translate a few lines of Cypher in to way too much Java code in version 1.9.x. Since then Cypher has changed and I suck a little less at Java, so I wanted to share a few different ways to translate one into the other just in case you stuck in a mid-eighties time warp and are paid by the number of lines of code you write per hour.

But first, lemme take a #Selfie let’s make some data. Michael Hunger has a series of blog posts on getting and creating data in Neo4j, we’ll steal borrow his ideas. Let’s create 100k nodes:

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
FOREACH (r IN range(0,100000) | CREATE (:User {username:names[r % size(names)]+r}))

…and let’s create around 500k relationships between them:

MATCH (u1:User),(u2:User)
WITH u1,u2
LIMIT 5000000
WHERE rand() < 0.1
CREATE (u1)-[:FRIENDS]->(u2);

…and let’s not forget to add an index:

CREATE INDEX ON :User(username);

Now when we look at our data we can see:

Screen Shot 2014-04-16 at 2.04.49 AM

Now if we wanted to build a recommendation for the top 10 Users “Michelle 1″ should be friends with, but isn’t right now we’d write something like this:

MATCH (me:User {username:'Michelle1'}) -[:FRIENDS]- people -[:FRIENDS]- fof
WHERE NOT(me -[:FRIENDS]- fof)
RETURN fof, COUNT(people) AS friend_count
ORDER BY friend_count DESC
LIMIT 10

…and we’d get an error like this after the 60 second timeout in the Browser window:

Screen Shot 2014-04-16 at 2.29.52 AM

Cypher as of 2.0.2 isn’t optimized for this kind of query (it’s coming), so let’s turn to the Java API. First thing we’ll want to do is find a user and then get their friends just to get used to the new Java API methods.

@GET
    @Path("/friends/{username}")
    public Response getFriends(@PathParam("username") String username, @Context GraphDatabaseService db) throws IOException {
        List<String> results = new ArrayList<String>();
        try ( Transaction tx = db.beginTx() )
        {
            final Node user = IteratorUtil.singleOrNull(db.findNodesByLabelAndProperty(DynamicLabel.label("User"), "username", username));

            if(user != null){
                for ( Relationship relationship : user.getRelationships(FRIENDS, Direction.OUTGOING) ){
                    Node friend = relationship.getOtherNode(user);
                    results.add((String)friend.getProperty("username"));
                }
            }
        }

        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Instead of going to the index directly, we are using the findNodesByLabelAndProperty method to find our user. Notice also, everything is wrapped in a Try block with a transaction. In 2.0 all interactions with the database have to be inside a transaction. With that out of the way, let’s take a look at getting the top 10 friends of friends which are not my current friends ordered by the number of mutual friends in Java:

@GET
    @Path("/fofs/{username}")
    public Response getFofs(@PathParam("username") String username, @Context GraphDatabaseService db) throws IOException {
        List<Map<String, Object>> results = new ArrayList<>();

        HashMap<Node, MutableInt> fofs = new HashMap<>();
        try ( Transaction tx = db.beginTx() )
        {
            final Node user = IteratorUtil.singleOrNull(db.findNodesByLabelAndProperty(DynamicLabel.label("User"), "username", username));

            findFofs(fofs, user);
            List<Map.Entry<Node, MutableInt>> fofList = orderFofs(db, fofs);
            returnFofs(results, fofList.subList(0, Math.min(fofList.size(), 10)));
        }

        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

I’ve placed findFofs, orderFofs and returnFofs in their own methods. We’re going to take a look at findFofs first, and I want you to pay attention because there is glaring bug that I missed the first time I did this that I am replicating here. See if you can spot it.

    private void findFofs(HashMap<Node, MutableInt> fofs, Node user) {
        List<Node> friends = new ArrayList<>();

        if (user != null){
            getFirstLevelFriends2(user, friends);
            getSecondLevelFriends2(fofs, user, friends);
        }
    }
   private void getFirstLevelFriends(Node user, List<Node> friends) {
        for ( Relationship relationship : user.getRelationships(FRIENDS, Direction.BOTH) ){
            Node friend = relationship.getOtherNode(user);
            friends.add(friend);
        }
    }

Now, here is where you really want to pay attention…

    private void getSecondLevelFriends(HashMap<Node, MutableInt> fofs, Node user, List<Node> friends) {
        for ( Node friend : friends ){
            for (Relationship otherRelationship : friend.getRelationships(FRIENDS, Direction.BOTH) ){
                Node fof = otherRelationship.getOtherNode(friend);
                if ((!user.equals(fof) && !friends.contains(fof))) {
                    MutableInt mutableInt = fofs.get(fof);
                    if (mutableInt == null) {
                        fofs.put(fof, new MutableInt(1));
                    } else {
                        mutableInt.increment();
                    }
                }
            }
        }
    }

Saw it? Me neither. Let’s test the performance of this endpoint using ApacheBench:

ab -k -c 1 -n 1 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

Our results are WAY better than before. 2.670 seconds vs the time outs we were seeing before.

Concurrency Level:      1
Time taken for tests:   2.670 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Keep-Alive requests:    1
Total transferred:      655 bytes
HTML transferred:       522 bytes
Requests per second:    0.37 [#/sec] (mean)
Time per request:       2670.414 [ms] (mean)
Time per request:       2670.414 [ms] (mean, across all concurrent requests)
Transfer rate:          0.24 [Kbytes/sec] received

That’s a huge improvement, but Neo4j performs millions of traversals per second and can provide real time recommendations… 2.670 seconds just doesn’t sound right. So let’s dig in a little by using YourKit.

YourKit

YourKit is a Java profiler which we can attach to a running Neo4j server and it’ll let us see what’s going on when we throw a little more load at it than 1 request. It’s not obvious but when you run Neo4j the name it shows up under is “Bootstrapper”. Take a look at the YourKit manual for more details.

Attach Yourkit

ab -k -c 8 -n 800 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

A little while after we start collecting profile information and begin running our test, this pops up:

Screen Shot 2014-04-22 at 10.58.27 AM

Oh oh… something is obviously wrong… let’s dig in.

Screen Shot 2014-04-22 at 10.59.07 AM

So something in getSecondLevelFriends is wasting time doing what looks like nothing…

private void getSecondLevelFriends2(HashMap<Node, MutableInt> fofs, Node user, List<Node> friends) {
        for ( Node friend : friends ){
            for (Relationship otherRelationship : friend.getRelationships(FRIENDS, Direction.BOTH) ){
                Node fof = otherRelationship.getOtherNode(friend);
                if ((!user.equals(fof) && !friends.contains(fof))) {

… and there it is. We’re calling contains on a List of Nodes instead of a Set of Nodes, so it’s going to scan it instead of go right to it. Log(n) vs Log(1) type of problem because I used the wrong data structure. So let’s change this to a Set and try it again.

ab -k -c 1 -n 1 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

Our results are WAY better than before. 91 milliseconds vs the 2.670 seconds we were taking before, vs the timeout from where we started.

Concurrency Level:      1
Time taken for tests:   0.091 seconds
Complete requests:      1
Failed requests:        0
Write errors:           0
Keep-Alive requests:    1
Total transferred:      655 bytes
HTML transferred:       522 bytes
Requests per second:    10.99 [#/sec] (mean)
Time per request:       91.019 [ms] (mean)
Time per request:       91.019 [ms] (mean, across all concurrent requests)
Transfer rate:          7.03 [Kbytes/sec] received

Let’s try giving it some load:

ab -k -c 8 -n 800 'http://127.0.0.1:7474/example/service/fofs/Michelle1'

… and now we’re getting 55 requests per second real time recommendations on my laptop.

Concurrency Level:      8
Time taken for tests:   14.536 seconds
Complete requests:      800
Failed requests:        0
Write errors:           0
Keep-Alive requests:    800
Total transferred:      524000 bytes
HTML transferred:       417600 bytes
Requests per second:    55.04 [#/sec] (mean)
Time per request:       145.361 [ms] (mean)
Time per request:       18.170 [ms] (mean, across all concurrent requests)
Transfer rate:          35.20 [Kbytes/sec] received

As always, the full source code is available on Github.

One last thing… in Neo4j 2.1… it goes almost twice as fast.

Concurrency Level:      8
Time taken for tests:   8.523 seconds
Complete requests:      800
Failed requests:        0
Write errors:           0
Keep-Alive requests:    800
Total transferred:      524000 bytes
HTML transferred:       417600 bytes
Requests per second:    93.86 [#/sec] (mean)
Time per request:       85.234 [ms] (mean)
Time per request:       10.654 [ms] (mean, across all concurrent requests)
Transfer rate:          60.04 [Kbytes/sec] received

Now that’s Amazing.

Tagged , , , , , , , , , , ,

One thought on “Translating Cypher To Neo4j Java API 2.0

  1. This is an amazing blog post.
    Is it possible to hint to cypher how to best approach the query, such that one does not need to use unmanaged extensions or the java api?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,576 other followers

%d bloggers like this: