Visualizing a set of Hiveplots with Neo4j

What should a graph look like and how can I tell two graphs apart?

These are questions Martin Krzywinski (Genome Sciences Center, Vancouver, BC) has been asking. Take a look at the picture below:

It’s the same graph, the same data, visualized 8 different ways. Which is the right way? What advantage does one layout give over the other? Can you tell it’s the same network? I can’t.

Eight layouts might be too much, so let’s just look at one on the next picture:

Martin took the spring embedded visualization and tweaked it around. Can you tell it’s the same graph, the same data underneath? I can’t.

To tackle this problem, Martin invented the Hive Plot, a perceptually uniform and scalable layout visualization for network visual analytics.

If you want to learn more about Hive Plots, take a look at his website and this presentation (it is quite large at 20 MB). I cannot do it justice in this short blog post, and in all honestly haven’t had the time to study it properly.

Today I just want to give you a little taste of Hiveplots. I am going to visualize the github graphs of nine languages you might not have heard of: Boo, Dylan, Factor, Gosu, Mirah, Nemerle, Nu, Parrot, Self. I’m not going to show you how to create the graph this time, because this is real data we are using. You can take a look at it on the data folder in github.

The graph is basically: (Language)–(Repository)–(User). There are two relationships between Repository and User, wrote and forked.

I’ll show you how to get the data out and into our visualization.

def wroterepos(language)
  neo = Neography::Rest.new
  neo.execute_script("m = [:]
                      g.V.filter{it.type == 'language' && it.name == '#{language}'}
                       .in.transform{m[it.name] = it.in('wrote').gather{it.name}.next()}
                       .iterate()
                      m")
end

We do the same thing but for forked. This may seem a bit strange to you, but what I am doing is kind of like the SQL equivalent of a LEFT OUTER JOIN with Gremlin.

def forkedrepos(language)
  neo = Neography::Rest.new
  neo.execute_script("m = [:]
                      g.V.filter{it.type == 'language' && it.name == '#{language}'}
                       .in.transform{m[it.name] = it.in('forked').gather{it.name}.next()}
                       .iterate()
                      m")
end

Now we do some ruby magic to put our data into the JSON format the visualization wants.

get '/hive/:name' do
  repos        = []
  writers      = [] 
  forkers      = []
  temp_forkers = []
  temp_writers = []

  wroterepos(params[:name]).each_pair do |key, value|
    repos << {"name" => key, "imports" => value, "node_type" => "repo"}
    temp_writers << { "name" => value[0] }
  end

  i = 0
  forkedrepos(params[:name]).each_pair do |key, value|
    repos[i]["imports"] =  repos[i]["imports"] + value
    temp_writers[i]["imports"] = value
    temp_forkers << value
    i += 1
  end

  temp_writers.group_by {|i| i["name"]}.each do |w, f|
    writers << {"name" => w, 
                "imports" => f.collect{|y| y["imports"]}.flatten.uniq, 
                "node_type" => "writer"}
  end

  temp_forkers.flatten.uniq.delete_if{|x| writers.collect{|y| y["name"]}.include?(x)}.each do |f|
    forkers << {"name" => f, 
                "imports" => [], 
                "node_type" => "forker"}
  end

  (repos + writers + forkers).to_json
end

The blue color nodes are our repositories, the yellow nodes are our writers, and the green nodes are our forkers. The 12 o’clock axis (the top) shows nodes with only outgoing relationships. The bottom-left axis shows nodes with only incoming relationships. These are the writers without any forks, and the forkers who never started their own public projects. The remaining nodes in the bottom-right have both incoming and outgoing relationships. These are the repository writers who created projects other people found worth forking.

The graphs are ordered across for each row in the following manner:

Boo, Dylan, Factor
Gosu, Mirah, Nemerle
Nu, Parrot, Self

Can you see the similarities between Boo, Factor and Numerle? See how different they are from Gosu and Self? What does the hive plot tell you about these Language github repositories?

You can try a live version at hiveplot.herokuapp.com/index.html and as always the code is available on github.

Our visualization was done by Rich Morin and Mike Bostock with D3.js. Is is a hot off the press work in progress. You can follow the action on this D3.js google group thread.

2 thoughts on “Visualizing a set of Hiveplots with Neo4j”

Hermann Schmidt says:

March 26, 2012 at 3:51 PM

Yes! This is where I will be going. Thank you very much for this inspiring post. I’ll dust off my Ruby and get into JS + D3.

Dylan says:

October 26, 2012 at 3:20 AM

You have some really fascinating posts, great work :)

Max De Marzi

Graphs, Graphs, and nothing but the Graphs