Gremlin with Neography

Gremlin is a domain specific language for traversing property graphs. Neo4j is one of the databases that can speak the gremlin language, and as promised I’ll show you how you can use it to implement friend recommendations as well as degrees of separation.

We can send any gremlin script to Neo4j via the REST API and neography using the execute_script command. Let’s implement suggestions_for so it sends a gremlin script to the server:

def suggestions_for(node)
  node_id = node["self"].split('/').last.to_i
  @neo.execute_script("g.v(node_id).
                         in('friends').
                         in('friends').
                         dedup.
                         filter{it != g.v(node_id)}.
                         name", {:node_id => node_id})
end

puts "Johnathan should become friends with #{suggestions_for(johnathan).join(', ')}"

# RESULT
# Johnathan should become friends with Mary, Phil


Let’s go through the gremlin steps:

g

is our graph.

v(node_id)

is the vertex with the id gathered from node_id (which will be passed as a parameter later). In gremlin a node is a vertex, and a relationship is an edge.

in('friends')

tells gremlin we want to traverse incoming relationships of type “friends” and we want to do this twice since we’re going to get friends of friends.

dedup

removes any duplicate nodes we found along the way… you know the popular kids.

filter{it != g.v(node_id)}

removes the original node (or vertex) from the list and finally

name

grabs the name property of the nodes we found. You want to parameterize your script when possible to avoid re-parsing and improve performance.

How about degrees of separation?

def degrees_of_separation(start_node, destination_node)
  start_node_id = start_node["self"].split('/').last.to_i
  destination_node_id = destination_node["self"].split('/').last.to_i
  @neo.execute_script("g.v(start_node_id).
                         as('x').
                         in.loop('x'){it.loops <= 4 & 
                                      it.object.id != destination_node_id}.
                         simplePath.
                         filter{it.id == destination_node_id}.
                         paths{it.name}", {:start_node_id => start_node_id,
                                           :destination_node_id => destination_node_id })
end

degrees_of_separation(johnathan, mary).each do |path|
  puts "#{(path.size - 1 )} degrees: " + path.join(' => friends => ') 
end

# RESULT
# 3 degrees: Johnathan => friends => Mark => friends => Phil => friends => Mary
# 2 degrees: Johnathan => friends => Mark => friends => Mary

This one is a bit more tricky. We once again start with our graph

g

then go to the start node (or vertex)

v(start_node_id)

we are going to name this step as x with the

as('x')

command because we’ll want to come back here later.

in.loop('x')

tells gremlin to traverse incoming relationships and keep looping but go back to x when the condition that follows becomes false. Gremlin keeps track of the number of loops you’ve taken and here we tell it to stop at 4

it.loops <= 4

and stop if we reach our destination along the path

it.object.id != destination_node_id}

We avoid repeating elements in the path with

simplePath

and we filter the results that end in our destination node with

filter{it.id == destination_node_id}

and get the names of the nodes in the path with

paths{it.name}

We then pass our parameters

{:start_node_id => start_node_id,
 :destination_node_id => destination_node_id }

Gremlin is very powerful and is aided by the whole Tinkerpop stack. You’ll need some time and a little something to get you in right state of mind but if you can grok it then reading the gremlin wiki will expand your mind. If you prefer to watch instead of read, this video will give you a pretty good introduction to the Gremlin language.

Tagged , , , ,

10 thoughts on “Gremlin with Neography

  1. Max,

    great post.

    Regarding: node_id = node["self"].split(‘/’).last
    Didn’t have the node a method for retrieving the id?

    And for the gremlin scripts it is better to use variables.
    So instead of: g.v(#{node_id})

    use g.v(node_id)
    and pass a json hash with {:node_id => node_id} to the execute-script (and to the gremli-plugin). That will avoid re-parsing of the scripts and memory burn on the server.

  2. maxdemarzi says:

    >Regarding: node_id = node["self"].split(‘/’).last
    >Didn’t have the node a method for retrieving the id?

    Yes. With sugar it would be node.neo_id, but just passing a hash then we need to do a little work.

    >And for the gremlin scripts it is better to use variables.
    Yup I see that now. I will tweak neography to take this into account and update the post.

    • maxdemarzi says:

      As of Neography 0.0.20 we have parameters for gremlin scripts and cypher queries. Post has been updated to reflect this.

  3. Wow, that was fast. And congrats to the 20th release!

  4. [...] Framework directly. In upcoming posts, I’ll show you two more ways to traverse the graph via Gremlin and Cypher as well as many more things you can do with [...]

  5. Hey Max,

    Thanks for the post. We at YourNextLeap have been tinkering with Neo4J for our Beta releases. We had locked our Gemfile at neography v0.0.14 and were using the old javascript method to filter nodes.

    I’m glad to see a post on Gremlin with Neography. This certainly serves as my reference point for advanced traversal. :)

    Cheers!

  6. [...] to find friends of friends and degrees of separation with the Neo4j REST API and a little bit of the Gremlin and Cypher languages. However, all we’ve seen is a little bit of text output. We [...]

  7. Zhemin Lin says:

    Hi Max,

    I tried it a little bit, but the result confused me.
    In my Neo4j, I created 4 vertices A (187222), B (187223), C (187224), D (187226).
    A and B are mutual friends.
    A and C are mutual friends.
    A and D are mutual friends.
    B and C are mutual friends.
    B and D are mutual friends.

    And I typed in Neo4j’s Gremlin console:
    gremlin> g.v(187222).in(‘FRIENDS’).in(‘FRIENDS’).dedup.filter{it != g.v(187222)}.name
    ==> c
    ==> d
    ==> b

    What should I do if I want to get a “null” (because A knows everyone)?

    And, C and D are not friends. But I typed:
    gremlin> g.v(187226).in(‘FRIENDS’).in(‘FRIENDS’).dedup.filter{it != g.v(187226)}.name
    ==> b
    ==> c
    ==> a

    What should I do to see a “c”?

    Thank you!

    • maxdemarzi says:

      You have mutual friendships so there are two relationships from each pair you mentioned up there. Not sure what you mean to get a null? Actually I’m not sure what you are trying to find.

  8. Peter Boling says:

    With the friend of friend example I am getting back 1st degree friends. My wife and I are friends (on facebook) and we have some mutual friends. When I get the suggestions back for who I should be come friends with I get back some who are *already* my friends, but are also her friends.

    Strangely I also am getting back my wife.

    How do I filter out those who I am already friends with?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,577 other followers

%d bloggers like this: