A few weeks ago I showed you how to visualize a graph using the chord flare visualization and how to visualize a network using a force directed graph visualization from D3.js.
On Twitter Claire Willett from Riparian Data asked:
https://twitter.com/#!/RiparianData/status/169099913580396544
This post on Graphs Beyond the Hairball by Robert Kosara explains why some non-traditional graph visualizations may work better and links us to an article explaining what a Node Quilt is and how it’s useful. We’re going to just take the first step and build a Matrix representation of a graph. We will use one of the JUNG clustering algorithms to help us understand it.
The study of social networks goes back to at least the ancient Greeks, but we won’t go back that far in time today… just to 1977. A man named Wayne Zachary recorded the interactions of a Karate Club at a University for 2 years. During this time a conflict developed between the administrator and the instructor and the club broke up into two. Turns out you could predict which club each member would belong to by building a graph of their weighted relationships and spliting it along the minimum-cut.
We’re working with Neo4j, and you all know Neo knows Kung Fu, so we’ll do something a little different. Plus looking at a two node cluster is kinda lame… let’s go bigger. We’re going to be using Neo4j with the JUNG jars already in the lib directory. Refer to JUNG in Neo4j – Part 1 if you need help with that. Let’s create our graph:
def create_graph neo = Neography::Rest.new graph_exists = neo.get_node_properties(1) return if graph_exists && graph_exists['name'] names = %w[Aaron Achyuta Adam Adel Agam Alex Allison Amit Andreas Andrey Andy Anne Barry Ben Bill Bob Brian Bruce Chris Corey Dan Dave Dean Denis Eli Eric Esteban Ezl Fawad Gabriel James Jason Jeff Jennifer Jim Jon Joe John Jonathan Justin Kim Kiril LeRoy Lester Mark Max Maykel Michael Musannif Neil] commands = names.map{ |n| [:create_node, {"name" => n}]} names.each_index do |from| commands << [:add_node_to_index, "nodes_index", "type", "User", "{#{from}}"] connected = [] # create clustered relationships members = 5.times.collect{|x| x * 10 + (from % 10)} members.delete(from) rels = 1 + rand(4) rels.times do |x| to = members[x] connected << to commands << create_rel(from, to) unless to == from end # create random relationships rels = 1 + rand(4) rels.times do |x| to = rand(names.size) commands << create_rel(from, to) unless (to == from) || connected.include?(to) end end batch_result = neo.batch *commands end
I am once again using the first 50 names from the Graph DataBase- Chicago Meet-Up group. You’ve seen me do this once or twice already, so I won’t go over it in detail, but as you can see above, I am forcing small clusters of relationships to exist along with random relationships. Our relationships have a weight property, so the create_rel method is just this:
def create_rel(x,y,z= 1 + rand(10)) [:create_relationship, "knows", "{#{x}}", "{#{y}}", {:weight => z}] end
Our visualization is expecting a list of nodes already in groups, and a list of relationships that includes it’s strength. The JSON object looks like this:
{"nodes":[{"name":"Myriel","group":1}, {"name":"Napoleon","group":1}, {"name":"Mlle.Baptistine","group":2}], "links":[{"source":1,"target":0,"value":1}, {"source":2,"target":0,"value":8}, {"source":3,"target":0,"value":10}] }
Getting our nodes is pretty easy, we’ll use Gremlin to retrieve all but the root node.
def get_nodes neo = Neography::Rest.new neo.execute_script("g.V.filter{it.id != 0}.transform{[it.id,it.name]}") end
Getting the relationships is also pretty ease, to switch it up, we’ll use Cypher.
def get_relationships neo = Neography::Rest.new cypher_query = " START a = node:nodes_index(type='User')" cypher_query << " MATCH a-[r:knows]-b" cypher_query << " RETURN ID(a), ID(b), r.weight" neo.execute_query(cypher_query)["data"] end
Now comes the fun part. To get our clusters, we’ll be using the Voltage Clusterer Class. We don’t want the root node getting in the way, so we’ll create a sub-graph using TinkerGraph that excludes it (you could do something similar to cluster just a small part of a larger graph). We then pass this graph on to GraphJung and set our VoltageClusterer to try to get 10 clusters for us.
def get_voltage_clusters neo = Neography::Rest.new lg = neo.execute_script("import edu.uci.ics.jung.algorithms.cluster.VoltageClusterer; to = new TinkerGraph(); g.V.filter{it.id != 0}.sideEffect{toVertex = to.addVertex(it.getId()); ElementHelper.copyProperties(it, toVertex); }.iterate(); g.E.sideEffect{toEdge = to.addEdge(it.getId(), to.v(it.getOutVertex().getId()), to.v(it.getInVertex().getId()), it.getLabel()); ElementHelper.copyProperties(it, toEdge); }.iterate(); vc = new VoltageClusterer(new GraphJung(to), 10); vc.cluster(10).id; ") end
We can then put it all together to build that JSON object.
get '/cluster' do clusters = Hash.new get_voltage_clusters.each_with_index {|item, index| item.each{|i| clusters[i.to_i] = index + 1} } nodes = get_nodes.map{|n| {"name" => n[1], "group" => clusters[n[0]]}} relationships = get_relationships.map{|r| {"source" => r[0] - 1, "target" => r[1] - 1, "value" => r[2]} } {:nodes => nodes, :links => relationships}.to_json end
Our visualization was done by Mike Bostock with D3.js. As always, the code is on Github. Click on the image below to see the live example on Heroku.
Note: The version on Heroku is using a pre-generated JSON file.
Hi,
I love your blog, always interesting to read!
Is there a specific reason for using Cypher for the second query and not doing it also using a Gremlin script?
No technical reason. I just want to make sure people are comfortable with the idea of mixing the REST API, Cypher and Gremlin all in one application. There is no reason for it to be Cypher vs Gremlin, it’s really Cypher AND Gremlin.
[…] JUNG in Neo4j – Part 2 […]