One Direction Relationships in Neo4j

onedirectionchop

In the Neo4j Property Graph model, every single Relationship must be Typed and Directed. This means they must have a specific name (FRIENDS, LIKES, FOLLOWS, etc) and have a Start Node and an End Node to show direction. What’s neat is that when you write your queries you can choose to ignore that. The following queries are all valid:

// Get all the people I follow 
MATCH (u1:Person)-[:FOLLOWS]->(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people that I follow or follow me
MATCH (u1:Person)-[:FOLLOWS]-(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people related to me 
MATCH (u1:Person)--(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

Let’s say we create a new FOLLOWS relationship from Node 1 to Node 2. Neo4j will update both of the nodes affected, adding an entry to the FOLLOWS-Outgoing list of Node 1, and an entry to the FOLLOWS-Incoming list of Node 2. This is a very powerful feature because it means I can traverse this relationship from either node and get to the other. This is done Atomically, so you never end up with half a relationship entry in the database and it keeps your database consistent at all times (unlike some other databases).

99.9% of the time, this is what we want. To be able to traverse relationships from either direction. So lets talk about the 0.1%. Let’s say you are building a mobile shopping application where your users can “Swipe Right” if they like an item, “Swife Left” if they don’t like an item, and “Double Tap” to add it to their shopping basket.

Tinder-Kwoller

We could model this two ways in Neo4j. The first way is the standard way. Every user action results in a Relationship for either LIKES, DISLIKES, or PURCHASES. From a User point of view, you want to know what items you don’t like, so they can be excluded from recommendation or search queries. However from an Item point of view, would you ever need to know which users didn’t like it? Let’s try this from a Dating application point of view. What feature would your users appreciate that shows them the people who don’t like them? Quick somebody get a hold of the Tinder data and create an app called “Asholes” that shows you all the people who swiped left!

Ashes-Heart-01

Right, that would be a terrible idea. So then, when Neo4j creates a new Relationship it uses up 34 bytes. It needs all that because it has pointers to the start node and end node as well as to the first property of the relationship and the rest of the relationship chain. So if you are expecting a ton of these types of relationships what can you do?

8bitroar

You can get help from Daniel Lemire and his Roaring Bitmap library. What we are going to do is not create the DISLIKES relationships, instead we are going to create a compressed bitmap and store the Node Ids of the Items the user didn’t like. Then we’re going to store this bitmap as a byte array in the user node property “dislikes” like so:

        try (Transaction tx = db.beginTx()) {
            Node user = db.findNode(Labels.User, "username", "maxdemarzi");
            Node thingDisliked = db.findNode(Labels.Thing, "name", "One Direction Boy Band");
            int thingDislikedNodeId = ((Number) thingDisliked.getId()).intValue();

            RoaringBitmap dislikes = new RoaringBitmap();
            dislikes.add(thingDislikedNodeId);

            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            dislikes.serialize(new DataOutputStream(baos));
            user.setProperty("dislikes", baos.toByteArray());
            tx.success();
        }

Great, but how do we get the node ids back? We just do it backwards and deserialize the byte array back into a RoaringBitmap.

    private static RoaringBitmap getRoaringBitmap(Node user, String property) throws IOException {
        RoaringBitmap rb = new RoaringBitmap();
        byte[] nodeIds = (byte[])user.getProperty(property);
        ByteArrayInputStream bais = new ByteArrayInputStream(nodeIds);
        rb.deserialize(new DataInputStream(bais));
        return rb;
    }

We would call this method and use db.getNodeById to get to the Item nodes we dislike:

Set<Node> dislikedItems = new HashSet<>();

RoaringBitmap dislikes = getRoaringBitmap(user, "dislikes");
for (int nodeId : dislikes.toArray()) {
   dislikedItems.add(db.getNodeById(nodeId));
}

So let’s use this in a recommendation engine. First, we will find all the items the target user PURCHASED. Then we’ll use the Incoming PURCHASED and LIKES relationships to find the top 25 users with similar tastes as our target user. For every item we purchased, we’ll give 5 points to any user who also bought it and 3 points to any user who liked it.

    private static HashMap<Node, MutableInt> getOtherUsers(Set<Node> likedItems, Set<Node> purchasedItems, Node user) {
        HashMap<Node, MutableInt> otherUsers = new HashMap<>();

        for (Node item : purchasedItems) {
            // Give 5 points to every person who purchased an Item I also purchased
            for (Relationship rel : item.getRelationships(RelationshipTypes.PURCHASED, Direction.INCOMING)) {
                Node otherUser = rel.getStartNode();
                MutableInt mutableInt = otherUsers.get(otherUser);
                if (mutableInt == null) {
                    otherUsers.put(otherUser, new MutableInt(5));
                } else {
                    mutableInt.add(5);
                }
            }
            // Give 3 points to every person who liked an Item I purchased
            for (Relationship rel : item.getRelationships(RelationshipTypes.LIKES, Direction.INCOMING)) {
                Node otherUser = rel.getStartNode();
                MutableInt mutableInt = otherUsers.get(otherUser);
                if (mutableInt == null) {
                    otherUsers.put(otherUser, new MutableInt(3));
                } else {
                    mutableInt.add(3);
                }
            }
        }

For every item we liked, we’ll give 2 points to any user who also liked it and 1 points to any user who bought it.

        for (Node item : likedItems) {
            // Give 2 points to every person who liked an Item I also liked
            for (Relationship rel : item.getRelationships(RelationshipTypes.LIKES, Direction.INCOMING)) {
                Node otherUser = rel.getStartNode();
                MutableInt mutableInt = otherUsers.get(otherUser);
                if (mutableInt == null) {
                    otherUsers.put(otherUser, new MutableInt(2));
                } else {
                    mutableInt.add(2);
                }
            }
            // Give 1 point to every person who purchased an Item I liked
            for (Relationship rel : item.getRelationships(RelationshipTypes.PURCHASED, Direction.INCOMING)) {
                Node otherUser = rel.getStartNode();
                MutableInt mutableInt = otherUsers.get(otherUser);
                if (mutableInt == null) {
                    otherUsers.put(otherUser, new MutableInt(1));
                } else {
                    mutableInt.increment();
                }
            }
        }
        // Remove self from similar users
        otherUsers.remove(user);
        return otherUsers;
    }

We’ll take the top 25 scoring users, and then find the items they liked or purchased. We’ll give 1 point every time an item appears on the list.

    private static HashMap<Node, MutableInt> getOtherItems(ArrayList<Node> similarUsers) {
        HashMap<Node, MutableInt> otherItems = new HashMap<>();
        for (Node similarUser : similarUsers) {
            for (Relationship rel : similarUser.getRelationships(Direction.OUTGOING,
                    RelationshipTypes.PURCHASED, RelationshipTypes.LIKES)) {
                Node item = rel.getEndNode();
                MutableInt mutableInt = otherItems.get(item);
                if (mutableInt == null) {
                    otherItems.put(item, new MutableInt(1));
                } else {
                    mutableInt.increment();
                }
            }
        }
        return otherItems;
    }

Then we will eliminate the items the user already bought, liked, or disliked. Remember our dislikedItems came from the RoaringBitmap.

         
                for (Node item : purchasedItems) {
                    otherItems.remove(item);
                }
                for (Node item : likedItems) {
                    otherItems.remove(item);
                }
                for (Node item : dislikedItems) {
                    otherItems.remove(item);
                }

Finally we recommend the top 10 items from those remaining. Checkout the full source code on github.

I also wrote another set of JMH tests using both models and the performance is pretty close.

Benchmark                              Mode  Cnt    Score    Error  Units
ServiceBenchmark.measureRecommend     thrpt    5  339.319 ±  9.334  ops/s
ServiceBenchmark.measureRecommend2    thrpt    5  376.071 ± 22.711  ops/s

So how much of a cost savings is it memory wise? Here is a test that shows the size of a RoaringBitmap as we add 3 items to it and then 1000 more ids. We are at 18 bytes vs 34 with 1 item, 20 vs 68 with 2 items, 30 vs 102 with 3 items, and so on. The final size was around 8k which beats 34k, but if the node ids would have been less spread out the compression would have been greater.

    @Test
    public void shouldbeSmallSize() throws IOException {
        RoaringBitmap dislikes = new RoaringBitmap();
        dislikes.add(1);
        assertEquals(18, dislikes.serializedSizeInBytes());
        dislikes.add(1000);
        assertEquals(20, dislikes.serializedSizeInBytes());
        dislikes.add(100000000);
        assertEquals(30, dislikes.serializedSizeInBytes());
        int maxUsers = 100000000;
        Random rand = new Random();
        for (int i = 0; i < 1000; i++){
            int randomItem = rand.nextInt(maxUsers);
            dislikes.add(randomItem);
        }
        int sizeInRelationships = 34 * 1000;
        assertTrue(dislikes.serializedSizeInBytes() <= sizeInRelationships);
}

One way relationships as a compressed bitmap is a bit of an advanced hack. It came into being because a certain customer had 1 billion one way relationships that were wasting too much memory. Another customer stores a bloom filter as a node property. They use it as a preliminary filter before traversing the graph. One of the things I absolutely love about Neo4j is that you have complete freedom to do whatever you want. The days of treating a database like dumb storage are over, so get creative and go crazy.

Tagged , , , , ,

6 thoughts on “One Direction Relationships in Neo4j

  1. Really inspiring blog post. Thanks for sharing !

  2. Adam Rogas says:

    You know this app would be huge :) “Quick somebody get a hold of the Tinder data and create an app called “Asholes” that shows you all the people who swiped left!”

  3. Is there a performance difference if you traverse a relationship in Cypher against its direction?

  4. […] Bitmaps saved as a byte array property on each node. In this case, not to keep track of “dislikes” but instead to keep track of which node id and property key combinations a user is allowed […]

  5. […] of the users who liked the post. We have seen an example of half of this before on a previous post creating one way relationships. If we used a bitmap we’d lose the order of the likes. So another alternative is to use a […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: