JUNG in Neo4j – Part 2

A few weeks ago I showed you how to visualize a graph using the chord flare visualization and how to visualize a network using a force directed graph visualization from D3.js.

On Twitter Claire Willett from Riparian Data asked:
https://twitter.com/#!/RiparianData/status/169099913580396544

This post on Graphs Beyond the Hairball by Robert Kosara explains why some non-traditional graph visualizations may work better and links us to an article explaining what a Node Quilt is and how it’s useful. We’re going to just take the first step and build a Matrix representation of a graph. We will use one of the JUNG clustering algorithms to help us understand it.

The study of social networks goes back to at least the ancient Greeks, but we won’t go back that far in time today… just to 1977. A man named Wayne Zachary recorded the interactions of a Karate Club at a University for 2 years. During this time a conflict developed between the administrator and the instructor and the club broke up into two. Turns out you could predict which club each member would belong to by building a graph of their weighted relationships and spliting it along the minimum-cut.

We’re working with Neo4j, and you all know Neo knows Kung Fu, so we’ll do something a little different. Plus looking at a two node cluster is kinda lame… let’s go bigger. We’re going to be using Neo4j with the JUNG jars already in the lib directory. Refer to JUNG in Neo4j – Part 1 if you need help with that. Let’s create our graph:

def create_graph
  neo = Neography::Rest.new
  graph_exists = neo.get_node_properties(1)
  return if graph_exists && graph_exists['name']

  names = %w[Aaron Achyuta Adam Adel Agam Alex Allison Amit Andreas Andrey 
             Andy Anne Barry Ben Bill Bob Brian Bruce Chris Corey 
             Dan Dave Dean Denis Eli Eric Esteban Ezl Fawad Gabriel 
             James Jason Jeff Jennifer Jim Jon Joe John Jonathan Justin 
             Kim Kiril LeRoy Lester Mark Max Maykel Michael Musannif Neil]

  commands = names.map{ |n| [:create_node, {"name" => n}]}

  names.each_index do |from| 
    commands << [:add_node_to_index, "nodes_index", "type", "User", "{#{from}}"]
    connected = []

    # create clustered relationships
    members = 5.times.collect{|x| x * 10 + (from % 10)}
    members.delete(from)
    rels = 1 + rand(4)
    rels.times do |x|
      to = members[x]
      connected << to
      commands << create_rel(from, to) unless to == from
    end    

    # create random relationships
    rels = 1 + rand(4)
    rels.times do |x|
      to = rand(names.size)
      commands << create_rel(from, to) unless (to == from) || connected.include?(to)
    end    
   end
   batch_result = neo.batch *commands
end

I am once again using the first 50 names from the Graph DataBase- Chicago Meet-Up group. You’ve seen me do this once or twice already, so I won’t go over it in detail, but as you can see above, I am forcing small clusters of relationships to exist along with random relationships. Our relationships have a weight property, so the create_rel method is just this:

def create_rel(x,y,z= 1 + rand(10))
  [:create_relationship, "knows", "{#{x}}", "{#{y}}", {:weight => z}]
end

Our visualization is expecting a list of nodes already in groups, and a list of relationships that includes it’s strength. The JSON object looks like this:

{"nodes":[{"name":"Myriel","group":1},
          {"name":"Napoleon","group":1},
          {"name":"Mlle.Baptistine","group":2}],
 "links":[{"source":1,"target":0,"value":1},
          {"source":2,"target":0,"value":8},
          {"source":3,"target":0,"value":10}]
}

Getting our nodes is pretty easy, we’ll use Gremlin to retrieve all but the root node.

def get_nodes
  neo = Neography::Rest.new
  neo.execute_script("g.V.filter{it.id != 0}.transform{[it.id,it.name]}")
end

Getting the relationships is also pretty ease, to switch it up, we’ll use Cypher.

def get_relationships
  neo = Neography::Rest.new
  cypher_query =  " START a = node:nodes_index(type='User')"
  cypher_query << " MATCH a-[r:knows]-b"
  cypher_query << " RETURN ID(a), ID(b), r.weight"
  neo.execute_query(cypher_query)["data"]
end

Now comes the fun part. To get our clusters, we’ll be using the Voltage Clusterer Class. We don’t want the root node getting in the way, so we’ll create a sub-graph using TinkerGraph that excludes it (you could do something similar to cluster just a small part of a larger graph). We then pass this graph on to GraphJung and set our VoltageClusterer to try to get 10 clusters for us.

def get_voltage_clusters
  neo = Neography::Rest.new
  lg = neo.execute_script("import edu.uci.ics.jung.algorithms.cluster.VoltageClusterer;
                            to = new TinkerGraph();
                            g.V.filter{it.id != 0}.sideEffect{toVertex = to.addVertex(it.getId()); 
                                           ElementHelper.copyProperties(it, toVertex);
                            }.iterate();   
                            g.E.sideEffect{toEdge = to.addEdge(it.getId(), 
                                                    to.v(it.getOutVertex().getId()), 
                                                    to.v(it.getInVertex().getId()),
                                                    it.getLabel());
                                            ElementHelper.copyProperties(it, toEdge);
                            }.iterate();
                            vc = new VoltageClusterer(new GraphJung(to), 10);
                            vc.cluster(10).id;
                            ")
end

We can then put it all together to build that JSON object.

get '/cluster' do
  clusters = Hash.new
  get_voltage_clusters.each_with_index {|item, index| item.each{|i| clusters[i.to_i] = index + 1} }
  nodes = get_nodes.map{|n| {"name" => n[1], "group" => clusters[n[0]]}}
  relationships = get_relationships.map{|r| {"source" => r[0] - 1, "target" => r[1] - 1, "value" => r[2]} }
  {:nodes => nodes, :links => relationships}.to_json
end

Our visualization was done by Mike Bostock with D3.js. As always, the code is on Github. Click on the image below to see the live example on Heroku.

Note: The version on Heroku is using a pre-generated JSON file.

3 thoughts on “JUNG in Neo4j – Part 2”

Bob says:

March 16, 2012 at 6:41 AM

Hi,
I love your blog, always interesting to read!

Is there a specific reason for using Cypher for the second query and not doing it also using a Gremlin script?

maxdemarzi says:

March 16, 2012 at 8:24 AM

No technical reason. I just want to make sure people are comfortable with the idea of mixing the REST API, Cypher and Gremlin all in one application. There is no reason for it to be Cypher vs Gremlin, it’s really Cypher AND Gremlin.

JUNG in Neo4j – Part 2 « Another Word For It says:

March 16, 2012 at 6:35 PM

[…] JUNG in Neo4j – Part 2 […]

Max De Marzi

Graphs, Graphs, and nothing but the Graphs