Hubway Data Visualization Challenge with Neo4j

Michael Hunger imported the Hubway Challenge dataset into a Neo4j graph database, and made it available for us to play with.

The datamodel looks like this:

I took a quick stab at it using just the Stations.

The following cypher query returns the top 200 end stations based on their starting point and ranks them by the number of trips taken between the stations.

def stations
  neo = Neography::Rest.new
  cypher = "START start_stations = node:Station(\"stationId:*\") 
            MATCH start_stations <-[:`START`]- trips -[:END]-> end_stations
            RETURN start_stations.name, end_stations.name, COUNT(*) AS cnt
            ORDER BY cnt DESC
            LIMIT 200"
  neo.execute_query(cypher)["data"]            
end

I build a JSON file with the name of the starting station and an array of the ending stations names, and number of trips.

get '/visualization' do
  stations.group_by(&:first).map {|k,v| {"name" => k, "follows" => v.collect{|n| { "name" => n[1], "counts" => n[2] }}}}.to_json
end

The count is later used to determine the width of the ribbons from one station to the other.

    paths.forEach(function(d) {
        var source = indexByName[name(d.name)],
            row = matrix;
        if (!row) {
            row = matrix = [];
            for (var i = -1; ++i < n;) row[i] = 0;
        }
       d.follows.forEach(function(d) { row[indexByName[name(d.name)]] = d.counts; });
    });

Check out the live version on Heroku and as always, the code is available on github.

So that’s a nice starting point, come to the Hack Day at The Bocoup Loft in Downtown Boston on Saturday, October 27, 2012 and see how far you can take this.

4 thoughts on “Hubway Data Visualization Challenge with Neo4j”

Max De Marzi

Graphs, Graphs, and nothing but the Graphs