Neo4j on Heroku – Part Two

We are picking up where we left off on Neo4j on Heroku –Part One so make sure you’ve read it or you’ll be a little lost. So far, we have cloned the Neoflix project, set up our Heroku application and added the Neo4j add-on to our application. We are now ready to populate our graph.

UPDATE: Learn a better way to create the graph in part 3 of my Neo4j on Heroku series.

Bring up two browser windows. On one you’ll go to your Neo4j instance running on Heroku,

$ heroku config
NEO4J_URL      => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014

and on the other you’ll go to the create_graph route of your app. So if you named your app neoflix, you’d go to neoflix dot herokuapp dot com/create_graph.

This will run the create_graph method and you’ll see nodes and relationships being created on the Neo4j Dashboard. It’s just over a million relationships, so it will take a few minutes. There are faster ways to load data into Neo4j (wait for part three of this series), but this will work in our case.

The fine folks at themoviedb.org provide an API for any developers that want to integrate movie and cast data along with posters or movie fan art. You can request an API key and they’ll respond very quickly. So let’s add this to our Heroku configs.

heroku config:add TMDB_KEY=XXXXXXX
Adding config vars and restarting app... done, vXX
  TMDB    => XXXXXXX

If you want to test locally you can do so by:

export TMDB_KEY=XXXXXXX

We can now use this environment variable on our application along with the ruby-tmdb gem by Aaron Gough:

require 'ruby-tmdb'

Tmdb.api_key = ENV['TMDB_KEY']
Tmdb.default_language = "en"

  def get_poster(data)
    movie = TmdbMovie.find(:title => CGI::escape(data["title"] || ""), :limit => 1)
    if movie.empty?
     "No Movie Poster found"
    else
      "<a href='#{movie.url}' target='_blank'>
       <img src='#{movie.posters.first.url}'>
       <h3>#{movie.tagline}</h3>
       <p>Rating: #{movie.rating} <br />
          Rated: #{movie.certification}</p><p>#{movie.overview}</p>"
    end
  end

We will visualize the graph like I showed you earlier using Neovigator, but instead of retrieving the properties of our node (since they’re pretty bland), we’ll request a movie poster.

We will not visualize the explicit relationships we created. Instead we will visualize the implicit movie recommendations graph. Let’s take a look at that method now:

  def get_recommendations(neo, node_id)
    rec = neo.execute_script("m = [:];
                              x = [] as Set;
                              v = g.v(node_id);

                              v.
                              out('hasGenera').
                              aggregate(x).
                              back(2).
                              inE('rated').
                              filter{it.getProperty('stars') > 3}.
                              outV.
                              outE('rated').
                              filter{it.getProperty('stars') > 3}.
                              inV.
                              filter{it != v}.
                              filter{it.out('hasGenera').toSet().equals(x)}.
                              groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();
 
                              m.sort{a,b -> b.value <=> a.value}[0..24];",
                              {:node_id => node_id.to_i})

    return [{"id" => node_id,
             "name" => "No Recommendations",
             "values" => [{"id" => "#{node_id}",
                           "name" => "No Recommendations"}]
            }] if rec == "{}"

    values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                         :name => v.split(':')[1] } }

    [{"id" => node_id ,"name" => "Recommendations","values" => values }]
  end

Let’s go through the code. In Groovy [:] is a map (equivalent to a Ruby Hash) and ultimately what we want to return, so we’ll create an empty one and fill it later. Then we’ll create a Set “x” (which is an unordered collection see Groovy List for ordered collections). We also get our starting vertex and assign it to “v”.

m = [:];
x = [] as Set;
v = g.v(node_id);

We will fill the empty Set we created with the generas of our movie and we’ll compare the generas of other movies against it later on.

v.
out('hasGenera').
aggregate(x).

We then go back 2 steps, which puts us at our starting movie and go to the users that have rated the movie with more than 3 stars.

back(2).
inE('rated').
filter{it.getProperty('stars') > 3}.

From these users, we step out to find all the movies they have also rated with more than 3 stars.

outV.
outE('rated').
filter{it.getProperty('stars') > 3}.

Which are not our starting movie (remember we set it to the variable “v”).

inV.
filter{it != v}.

…and we check that these movies have the same generas as our starting movie (remember we filled the Set “x”).

filter{it.out('hasGenera').toSet().equals(x)}.

groupCount does what it sounds like and stores the value in the map “m” we created earlier. However, we want to get the id, title and count, so we do a little string wrangling to get both id and title (minus commas… I’ll tell you why in a minute) and iterate(). The Gremlin shell iterates automatically for you, but since we’re sending this Gremlin script over the REST API, it doesn’t. One day you’ll be pulling out your hair trying to figure out what’s wrong and you’ll curse “iterate” once you figure it out…

groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();

Here we sort our Map (b has the count) and get the top 25 entries.

m.sort{a,b -> b.value <=> a.value}[0..24];",

Since Neo4j will be executing this code many times over, you want to parametize it, so it parses it only once.

{:node_id => node_id.to_i})

If we get an empty hash back, we’ll return an unfortunate “No Recommendations” message,

return [{"id" => node_id,
         "name" => "No Recommendations",
         "values" => [{"id" => "#{node_id}",
                       "name" => "No Recommendations"}]
        }] if rec == "{}"

Finally we structure our Groovy Map into an array of hashes which we use in our visualization like I showed you with Neovigator. Notice I’m splitting the record by commas (hence why we substituted them earlier). This piece won’t be necessary very soon as the final version of Neo4j 1.6 will have JSON support for Groovy Maps.

values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                     :name => v.split(':')[1] } }
[{"id" => node_id ,"name" => "Recommendations","values" => values }]

We save the results of getting a movie poster and its recommendations for 30 days by taking advantage of the Varnish Cache provided to us by Heroku. We then get our starting node either by id or by title.

  get '/resources/show' do
    response.headers['Cache-Control'] = 'public, max-age=2592000'
    content_type :json

    if params[:id].is_numeric?
      node = neo.get_node(params[:id])
    else
      node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")
    end

    id = node_id(node)

    {:details_html => "<h2>#{get_name(node["data"])}</h2>" + get_poster(node["data"]),
     :data => {:attributes => get_recommendations(neo, id),
               :name => get_name(node["data"]),
               :id => id}
     }.to_json
  end

By title? Yes, we are adding JQuery UI autocomplete to our application. Which will pass the name of the movie and look it up in the automatic index we created.

node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")

… and there you have it. Your very own Movie Recommendation website on Heroku. See the complete code at github.com/maxdemarzi/neoflix.

Tagged ,

4 thoughts on “Neo4j on Heroku – Part Two

  1. […] Strategies When we are looking at an unregistered user (somebody just browsing the site) using the Item based recommendation we already built is all we have to go on. Once the user registers and gives us some information […]

  2. […] can see the 1 to 5 star ratings and you already know how to build a recommendation algorithm out of this. We also know how to predict what the start rating of the user will be using […]

  3. It looks like the Heroku app returns 500

Leave a comment