Neo4j on Heroku – Part One

On his blog Marko A. Rodriguez showed us how to make A Graph-Based Movie Recommender Engine with Gremlin and Neo4j.

In this two part series, we are going to take his work from the Gremlin shell and put it on the web using the Heroku Neo4j add-on and altering the Neovigator project for our use case. Heroku has a great article on how to get an example Neo4j application up and running on their Dev Center and Michael Hunger shows you how to add JRuby extensions and provides sample code using the Neo4j.rb Gem by Andreas Ronge.

We are going to follow their recipe, but we are going to add a little spice. Instead of creating a small 2 node, 1 relationship graph, I am going to show you how to leverage the power of Gremlin and Groovy to build a much larger graph from a set of files.

Let’s start by cloning the Neoflix Sinatra application, and instead of installing and starting Neo4j locally, we are going to create a Heroku application, and add Neo4j.

git clone git@github.com:maxdemarzi/neoflix.git
cd neoflix
bundle install
heroku apps:create neoflix --stack cedar
heroku addons:add neo4j
git push heroku master

Let’s make sure that Neo4j was successfully added to our project:

$ heroku addons
logging:basic
neo4j:test
releases:basic

Great, there it is (if you are reading this in the future it might say neo4j:basic or neo4j:silver or something like that). So where is our Neo4j database exactly?

$ heroku config
GEM_PATH       => vendor/bundle/ruby/1.9.1
LANG           => en_US.UTF-8
NEO4J_HOST     => 70825a524.hosted.neo4j.org
NEO4J_INSTANCE => 70825a524
NEO4J_LOGIN    => xxxxxxxx
NEO4J_PASSWORD => yyyyyyyy
NEO4J_PORT     => 7014
NEO4J_REST_URL => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014/db/data
NEO4J_URL      => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014
PATH           => bin:vendor/bundle/ruby/1.9.1/bin:/usr/local/bin:/usr/bin:/bin
RACK_ENV       => production

The xs and ys are our username and password. We can use the address given in NEO4J_URL to take a look at the server. For part two, it would be wise to keep an eye on the “dashboard” as we create new nodes and relationships. The Neoflix project layout:

neoflix.rb
public/movies.dat
public/users.dat
public/ratings.dat

Let’s take a look at the source code in neoflix.rb: We require our gems and use the NEO4J_URL variable to tell Neography how to reach the Neo4j server.

require 'rubygems'
require 'neography'
require 'sinatra'

neo = Neography::Rest.new(ENV['NEO4J_URL'] || "http://localhost:7474")

Then we create a route in Sinatra that will clear and populate the graph when we visit it.

get '/create_graph' do
  neo.execute_script("g.clear();")
  create_graph(neo)
end

We use a Gremlin shortcut to delete the graph before creating it.

g.clear();

The Backup and Restore feature of the Heroku Add-on lets you reload your graph as well, but the Neo4j instance will be down temporarily during the exchange.

If you want to permanently delete the Neo4j instance (once you are done with this example application) you can simply remove the heroku addon.

heroku addons:remove neo4j:test
Removing neo4j:test from neoflix...done.

Let’s see part of the create_graph method.

We do not want to create the graph if it already exists. So we check to see if there are any Movie nodes before starting.

def create_graph(neo)
  return if neo.execute_script("g.idx('vertices')[[type:'Movie']].count();").to_i > 0

Since we wiped everything clean, we setup automatic Indexing on all vertices and all properties.

if neo.execute_script("g.indices;").empty?  
  neo.execute_script("g.createAutomaticIndex('vertices', Vertex.class, null);") 
end

We are going to create a lot of data, so we set our graph to commit every 1000 changes in an automatic transaction.

g.setMaxBufferSize(1000);

Here comes some magic. We do not have access to the file system of the server running our Neo4j instance but since we have the full power of Groovy at our disposal, we simply grab the file from Sinatra instead. Anything you put in the public directory will be automatically served for you. The fields of movies.dat are delimited by “::” and the generas are delimited by “|”.

1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance

So for each line in our file, we are going to create a movie vertex, and link it to one or more generas. We are sending this Gremlin script inside a Ruby String, so we must escape the escape slashes which escape the | in the final script. As we go along, we are also creating vertices for the generas if they don’t already exist.

'http://neoflix.heroku.com/movies.dat'.toURL().eachLine { def line ->
  def components = line.split('::');
  def movieVertex = g.addVertex(['type':'Movie', 
                                 'movieId':components[0].toInteger(), 
                                 'title':components[1]]);
  components[2].split('\\\\|').each { def genera ->
    def hits = g.idx(Tokens.T.v)[[genera:genera]].iterator();
    def generaVertex = hits.hasNext() ? hits.next() : g.addVertex(['type':'Genera', 
                                                                   'genera':genera]);
    g.addEdge(movieVertex, generaVertex, 'hasGenera');
  }
};

If you are a Rubyist, you should be able to read that Groovy code, but let me point out a few things. In Groovy variable definitions it is mandatory to either provide a type name explicitly or to use “def” in replacement.

And this funky piece of code is an unfortunate escape of the pipe character by a backslash which also needs to be escaped, which are both in our Ruby String and must also be escaped.

components[2].split('\\\\|').each { def genera ->

This next bit of code looks up the genera in our index, and if it doesn’t exist, it creates it.

def hits = g.idx(Tokens.T.v)[[genera:genera]].iterator();
def generaVertex = hits.hasNext() ? hits.next() : g.addVertex(['type':'Genera', 
                                                               'genera':genera]);

This Hash inside an Array inside an Array looking construct is Gremlins way of querying the index. We are telling it to return a node if it has a property genera that matches the genera variable we parsed after splitting the components[2] field.

g.idx(Tokens.T.v)[[genera:genera]].iterator();

We do this a few more times to load the users and ratings into our graph and end with this:

g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);")

Which commits any left over items in our transaction buffer.

In Part Two, we’ll bring up our Heroku app, load the data, possibly add Movie Posters from a third party API, and visualize some of the implicit relationships in the graph as outlined in the original blog post… and I’ll probably do a part Three which will use the fresh off the presses CSV File Importer and reload the graph with a bigger set of movie data using Heroku. In between however I think it’s time we looked at Neo4j Spatial. You’ll know when new posts are published by following me on Twitter.

Tagged

5 thoughts on “Neo4j on Heroku – Part One

  1. […] Neo4j on Heroku – Part One « Max De Marzi […]

  2. […] Neo4j on Heroku Part 1 starts out: On his blog Marko A. Rodriguez showed us how to make A Graph-Based Movie Recommender Engine with Gremlin and Neo4j. […]

  3. Nada Amin says:

    Thanks for posting this informative series. I created a port of your project to Scala & Play 2.0. I actually don’t see any advantaged to what I’ve done :-P At first, I wanted to use the gremlin-scala plugin, but it’s not a good idea to do so over the RESTful API.

    Anyways, you can see it here: http://github.com/namin/neoflix-scala

  4. […] going to dust off the Neoflix project from the beginning of the year and add a few features. It has been updated to work on Neo4j version 1.7 and allows searching for […]

  5. […] Neography on Sinatra in Rubyby Max de MarziTweetIn a great spin on the classic (for Neo4j) dataset, Neoflix explores a movie database with great D3-powered visualizations to make recommendations about other films you might want to watch.You should read everything Max writes, including a post about this project. […]

Leave a comment