Catalogs and Hierarchies

When I was younger, friends and I would play a role playing game called “Shadowrun“. The game draws elements from science fiction, crime dramas, and magic and blends them all together to make a fun mess. You could be a Dwarf Shaman, an Elf Decker, a Human Rigger, an Orc Adept, a Troll Street Samurai or whatever combination your heart desired. Choosing a gender, race and archetype was just the beginning a more important question: “What is your character going to wear and take on missions?”

When the going gets tough, the tough go shopping!

To answer this question we had the “Street Samurai Catalog” which was a combination weapons catalogue and youtube comments. For example, if we wanted our Street Samurai to carry a weapon, we had a few categories: Pistol, Rifle, Shotgun, Sub Machine Gun, or Melee. If we went with Pistols, you had light pistols like the Beretta 101T or you could go heavy with the Ares Predator.

We have quite a few customer using Neo4j to manage catalog and hierarchy data. Everything from alternate product reviews and recommendations, upgraded component parts, categorical discounts and promotions, you name it. Today I am going to show you two basic things. First getting the whole hierarchy in JSON format, and second following a node up the hiearchy to a category promotion.

First let’s recreate part of the Street Samurai Catalog in Neo4j:

        CREATE (root:Item {id:'0', name:'Street Samurai Catalog'})
        CREATE (child1:Item {id:'1', name:'Rifles'})
        CREATE (child11:Item {id:'11', name:'Sniper'})
        CREATE (child12:Item {id:'12', name:'Machine Gun'})
        CREATE (child2:Item {id:'2', name:'Pistols'})
        CREATE (child21:Item {id:'21', name:'Heavy'})
        CREATE (child211:Item {id:'211', name:'Ares Predator'})
        CREATE (child3:Item {id:'3', name:'Shotguns'})
        CREATE (promotion211:Promotion {id:'p211', name:'Includes Ares Smartgun Link'})
        CREATE (promotion2:Promotion {id:'p2', name:'Free Two-Day Shipping'})
        MERGE (root)-[:HAS_CHILD]->(child1)
        MERGE (child1)-[:HAS_CHILD]->(child11)
        MERGE (child1)-[:HAS_CHILD]->(child12)
        MERGE (root)-[:HAS_CHILD]->(child2)
        MERGE (child2)-[:HAS_CHILD]->(child21)
        MERGE (child21)-[:HAS_CHILD]->(child211)
        MERGE (root)-[:HAS_CHILD]->(child3)
        MERGE (child211)-[:HAS_PROMOTION]->(promotion211)
        MERGE (child2)-[:HAS_PROMOTION]->(promotion2)

Great! Now that it’s in there how do we get a JSON tree out of it? Well it turns out that’s not an easy task in Cypher. We could start at the root and traverse all the way out the HAS_CHILD relationship but that is going to return the same nodes many times.

MATCH p=(root:Item {id:'0'}) -[:HAS_CHILD*0..]->(item)-[:HAS_PROMOTION*0..]->(promotion)
WITH nodes(p) AS nodes
RETURN nodes

We could instead match all the HAS_CHILD relationships and return the parent and the children.

MATCH (parent:Item)-[:HAS_CHILD]->(item)
OPTIONAL MATCH (item)-[:HAS_PROMOTION]->(promotion)
RETURN parent, item, COLLECT(promotion) AS promotions

That’s a little better, but we still need to construct the tree from this result set. I don’t want to do that, I want to get good clean JSON back directly from Neo4j. Luckily for us there is an Awesome Procedure on Cypher that does this for us:

MATCH path=(root:Item {id:'0'})-[:HAS_CHILD*]->(item)-[:HAS_PROMOTION*0..]->(promotion)
WITH COLLECT(path) AS paths
CALL apoc.convert.toTree(paths) YIELD value RETURN value

The query above produces the JSON below which is really nice.

{
              "_type": "Item",
              "name": "Street Samurai Catalog",
              "_id": 0,
              "id": "0",
              "has_child": [
                {
                  "_type": "Item",
                  "name": "Rifles",
                  "_id": 1,
                  "id": "1",
                  "has_child": [
                    {
                      "_id": 2,
                      "_type": "Item",
                      "name": "Sniper",
                      "id": "11"
                    }
...

Now what happens when we have a much bigger catalog? Let’s find out by creating one starting with a root node:

CREATE (u:Item {id:"1-Root"})

We will add an Index to make our life easier:

CREATE INDEX ON :Item(id);

We are going to follow the same steps we took in a previous blog post and create a 2 child per node catalog of about 1 million items.

WITH ["Ares","Beretta","Ceska","Colt","Heckler & Koch","Fichetti Security","Ruger","Seco",
"Walther","Winchester"] AS products
MATCH (item:Item) WHERE item.id STARTS WITH "1-"
WITH range(1,2) as children, item, products
FOREACH (id in children | CREATE (item)-[:HAS_CHILD]->(:Item {id:"2-" + 
products[id% size(products)]+id}) );

We continue this incrementing the “1-” and “2-” until we get to level 20:

WITH ["Ares","Beretta","Ceska","Colt","Heckler & Koch","Fichetti Security","Ruger","Seco",
"Walther","Winchester"] AS products
MATCH (item:Item) WHERE item.id STARTS WITH "19-"
WITH range(1,2) as children, item, products
FOREACH (id in children | CREATE (item)-[:HAS_CHILD]->(:Item {id:"20-" + 
products[id% size(products)]+id}) );

Now I can run the following cypher and get the tree at different depths using *1..X and timing them. For my tests I will run 4 threads on my 4 core desktop for 20 seconds and get the requests per second and latencies of the second run. At a depth of 14, it was taking about a second and half to return… maybe we can do better?

MATCH path=(root:Item)-[:HAS_CHILD*1..14]->(item)-[:HAS_PROMOTION*0..]->(promotion)
WHERE root.id = '1-Root'
WITH COLLECT(path) AS paths
CALL apoc.convert.toTree(paths) YIELD value RETURN value

Let’s get our hands dirty and write an extension. Our API call is going to very simple we just need the root of the tree and an optional max depth to traverse to.

    @GET
    @Path("/catalog/{id}")
    @Produces({"application/json"})
    public Response catalog(@PathParam("id") String id, 
                            @DefaultValue("999") @QueryParam("depth") int maxDepth, 
                            @Context final GraphDatabaseService db) throws IOException {

We want to stream our output as soon as we have something, so let’s set a JsonGenerator to do that:

StreamingOutput stream = os -> {
            JsonGenerator jg = objectMapper
                              .getJsonFactory()
                              .createJsonGenerator(os, JsonEncoding.UTF8);

First thing we want to do is find our root or starting point of this traversal, and then we can recursively get its properties and children:

root = db.findNode(Labels.Item, "id", id);
if (root != null) {
    getPropertiesAndChildrenRecursively(root, jg, 0, maxDepth);
}

The getPropertiesAndChildrenRecursively method will start writing the properties of the node, but it won’t close it as we also want to get Promotions and all its children.

private void getPropertiesAndChildrenRecursively(Node node, JsonGenerator jg, int depth, int maxDepth) 
throws IOException {
        jg.writeStartObject();
        for (Map.Entry<String, Object> entry : node.getAllProperties().entrySet()) {
            jg.writeFieldName(entry.getKey());
            jg.writeObject(entry.getValue());
        }

Next we create an Array of promotions and fill it with any the node has by following the HAS_PROMOTION relationship:

        jg.writeArrayFieldStart("promotions");
        for (Relationship rel : node.getRelationships(RelationshipTypes.HAS_PROMOTION, 
                                                      Direction.OUTGOING)) {
            Node promotion = rel.getEndNode();
            jg.writeObject(properties.get(promotion.getId()));
        }
        jg.writeEndArray();

Then we will create a second array for its children and call getPropertiesAndChildrenRecursively for each child.

        jg.writeArrayFieldStart("children");
        depth++;
        if (depth <= maxDepth) {
            // recursively do the same thing for every child node
            for (Relationship rel : node.getRelationships(RelationshipTypes.HAS_CHILD, 
                                                          Direction.OUTGOING)) {
                Node nextNode = rel.getEndNode();
                getPropertiesAndChildrenRecursively(nextNode, jg, depth, maxDepth);
            }
        }

Once all the children have been processed, we can close the array and object:

jg.writeEndArray();
jg.writeEndObject();

Now when we call “:GET /v1/service/catalog/0” on the browser, we get:

{
  "name": "Street Samurai Catalog",
  "id": "0",
  "promotions": [],
  "children": [
    {
      "name": "Shotguns",
      "id": "3",
      "promotions": [],
      "children": []
    },
    {
      "name": "Pistols",
      "id": "2",
      "promotions": [
        {
          "name": "Free Two-Day Shipping",
          "id": "p2"
        }
      ]
...

Turning back to the larger graph I ran cypher + apoc vs the extension and this is what I ended up with:

The extension starts out being able to respond with twice as many requests per second in the beginning and as we go deeper into the tree it gets to about 5-10x faster.

The mean latencies hold at less than 1 millisecond up to a depth of 4. At depth 16 the cypher queries are returning after about 6 seconds and the extension crosses the 1 second mark. Finally with the whole tree, we are at about 1 minute vs 10 seconds.

Now it is rare for our customers to want to get a whole tree back from Neo4j. Usually they are more interested in seeing if an item has any discounts or promotions tied to it. Traversing from the root of the tree down costs us about 1 million relationship traversals… BUT going from the bottom of the tree up to the root only costs us 20 traversals. Plus checking each of those 20 for a HAS_PROMOTION relationship, so worst case 40. Neo4j can do that in under 1ms.

MATCH (start:Item {id:'211'}) <-[:HAS_CHILD*0..999]-(item)-[:HAS_PROMOTION]->(p)
RETURN {promotions: COLLECT(p)} AS promotions

But don’t take my word for it. Load up your product catalog and try it out. Source code as always is on github.

Max De Marzi

Graphs, Graphs, and nothing but the Graphs