The “last mile” is a term used in the telecommunications industry that refers to delivering connectivity to the customers that will actually be using the system. In the sense of Graph Databases, it refers to how well the end user can extract value and insight from the graph. We’ve already seen an example of this concept with Graph Search, allowing a user to express their requests in natural language. Today we’ll see another example. We’ll be taking advantage of the features of Neo4j 2.0 to make this work, so be sure to have read the previous post on the matter.
We’re going to be using VisualSearch.js made by Samuel Clay of NewsBlur. VisualSearch.js enhances ordinary search boxes with the ability to autocomplete faceted search queries. It is quite easy to customize and there is an annotated walkthrough of the options available. You can see what it does in the image below, or click it to try their demo.
We’ve previously prepared a Neo4j 2.0 graph with Actors, Director, Producers, Writers, and Users all connected to Movies.
The first thing we need to do is find the Facets for visualsearch.js. We don’t want to configure this manually, because that would be painful and our graph may change over time. So instead we’ll use the “list_labels” method to get the Labels of our Graph:
get '/facets' do content_type :json cache_control :public, :max_age => 600 facets = [] categories = $neo.list_labels categories.each do |cat| get_properties(cat).each do |label| facets << {:category => cat, :label => cat + "." + label} end end facets.to_json end
One of the nice things we can do is group properties of a label together, we don’t have a hard schema for what properties are in each Label, but we can query the graph, grab one node and see the properties it has.
def get_properties(category) cypher = "MATCH n:#{category} RETURN n LIMIT 1" $neo.execute_query(cypher)["data"].first.first["data"].keys end
This will return a JSON array that looks like:
[{"category":"Writer","label":"Writer.born"}, {"category":"Writer","label":"Writer.name"}, {"category":"Actor","label":"Actor.born"}, {"category":"Actor","label":"Actor.name"} ...
We will pass this on to visualsearch.js and have our first drop down working with these grouped label properties.
Once a user clicks on one of the properties, we will fill in some of the available options for that property. We can do this with cypher by MATCHing the nodes of the specified Label that have the property we care about and grouping it so we only get the top 25 unique values.
get '/values/:facet/' do content_type :json label, key = get_label_and_key(params) cypher = "MATCH node:#{label} WHERE HAS(node.#{key}) RETURN DISTINCT node.#{key} AS value ORDER BY value LIMIT 25" $neo.execute_query(cypher)["data"].collect{|x| x.first.to_s}.compact.flatten.to_json end
Now we can see some of the values in our search box. In this example, we are grabbing names of Actors in our graph.
The top 25 items is nice, but what if we’re looking for an Actor whose name beings with the letter Z like, “Zach Grenier“? Visualsearch.js gives us the ability to start typing the value and it will reset our options to match.
We will enhance our previous query by adding a case insensitive regular expression with the term or part of the term we are looking for.
get '/values/:facet/:term' do content_type :json label, key = get_label_and_key(params) cypher = "MATCH node:#{label} WHERE HAS(node.#{key}) AND node.#{key} =~ {term} RETURN DISTINCT node.#{key} AS value ORDER BY value LIMIT 25" $neo.execute_query(cypher, {:term => "(?i).*" + params[:term] + ".*"})["data"].collect{|x| x.first.to_s}.compact.flatten.to_json end
Once we click on Zach Grenier, a few things happen. We get a little message telling us that:
You searched for: Actor.name: “Zach Grenier”. (1 node)
Our search bar comes alive again with the next set of Labels to query on…
… and our graph (currently consisting of just one node) is populated via vivagraph.js. See this previous vivagraph.js post for more information on how this great graph visualization library works.
Now… I know you may be thinking… we populated an Actor node, and now only Movie is available in our drop down. How did that happen? That’s the magic of this application. Instead of just grabbing any next node at random, we are taking the context of our first node and building a path of available connections from there. If we click on “Movie.title”, we call the following method under the covers to get our possibilities:
post '/connected_values/:facet/' do content_type :json related_label, related_key = get_label_and_key(params) match, where, values = prepare_query(params) last_node = get_last_node_id(params) where.pop where << "HAS(node#{last_node}.#{related_key})" cypher = prepare_cypher(match,where) cypher << "WITH LAST(EXTRACT(n in NODES(p) : n.#{related_key}?)) AS value " cypher << "RETURN DISTINCT value ORDER BY value LIMIT 25" parameters = prepare_parameters(values) $neo.execute_query(cypher, parameters)["data"].flatten.collect{|d| d.to_s}.to_json end
It looks a little complicated, but all we are doing is just building a cypher query dynamically that will end up looking like this:
MATCH p = node0:Actor -- node1:Movie WHERE node0.name? = {value0} AND HAS(node1.title) WITH LAST(EXTRACT(n in NODES(p) : n.title?)) AS value RETURN DISTINCT value ORDER BY value LIMIT 25
This Cypher query will be executed with the parameters {“value0″=>”Zach Grenier”}. It will find the Actor node for Zach Grenier in the graph, and then find the nodes that are labeled “Movie” and are related to Zach Grenier, and then extract the property “title” from the last node in our path (which happen to be the movies Zach Grenier is in) and give us our answer.
In our graph, we only have two things connected to Zach Grenier… the Movie “RescueDawn” and “Twister”. Let’s go ahead and click on Twister:
We query the graph for the pattern Actor named “Zach Grenier” that is connected to the movie titled “Twister”. The graph finds this pattern, returns the nodes and relationships within this pattern, and Twister gets added to our graph, connected to Zach Grenier.
The patterns we can create can go beyond just a single hop, for example. Actor born in 1929, that acted in “Snow Falling on Cedars” alongside Rick Yune, who was also in Ninja Assassin, alongside other actors…
MATCH p = node0:Actor -- node1:Movie -- node2:Actor -- node3:Movie -- node4:Actor WHERE node0.born? = {value0} AND node1.title? = {value1} AND node2.name? = {value2} AND node3.title? = {value3} AND HAS(node4.name) WITH LAST(EXTRACT(n in NODES(p) : n.name?)) AS value RETURN DISTINCT value ORDER BY value LIMIT 25"
This query will be executed with the parameters: {“value0″=>1929, “value1″=>”Snow Falling on Cedars”, “value2″=>”Rick Yune”, “value3″=>”Ninja Assassin”}. One of the Actors at the end of the pattern is “Naomie Harris” and once we click on her we get this graph:
Don’t just take my word for it thought. Try the live Demo, take a look at the source code, and try pointing it at your own Neo4j 2.0 Labeled Graph.
What is missing?
This is a dynamic UI that gives an end user quick access to the graph. However, the astute observer will notice something is missing. The relationship types. The patterns we are creating and matching against the graph only care about nodes that are connected, not in the way they are connected, and that might be a very important feature of our graph we are omitting. Alas, this little project is not the last mile, it is but one step further, and eventually we’ll reach it.
Help me work on these kinds of problems.
Understanding the power of graphs will give your data architect skills a boost. Don’t let this blog post be the last time you think in graphs. Learn about graphs at one of the dozens of events already on the Calendar and keep an eye out as more get added every week. Take some time to watch these great graph videos from the events you might have missed. Read the Graph Databases book, and of course… subscribe to my blog and follow me on Twitter.