Who is the Greatest?

I done wrestled with an alligator, I done tussled with a whale, only last week I murdered a rock, injured a stone, hospitalized a brick. I’m so mean I make medicine sick.

He floats like a butterfly and stings like a bee, but is Muhammad Ali truly the greatest? Greater than the Beatles? Greater than Alexander? Greater than Sliced Bread? Let’s find out.

We begin by requiring neography and creating a function to help us create the greats.

require 'rubygems'
require 'neography'

def create_great(name)
  Neography::Node.create("name" => name)
end

There are a ton of greats out there, but we’ll keep it simple and create just 8 of the greatest greats.

game      = create_great('The 1958 NFL Championship Game')
brando    = create_great('Marlon Brando')
alex      = create_great('Alexander the Great')
circus    = create_great('The Ringling Bros. and Barnum and Bailey')
beatles   = create_great('The Beatles')
ali       = create_great('Muhammad Ali')
bread     = create_great('Sliced Bread')
gatsby    = create_great('The Great Gatsby')

greats = [game,brando,alex,circus,beatles,ali,bread,gatsby]

That’s a pretty good list. We can’t say one great is greater than another, but given any two greats, we can say that they are “As Great” as each other.

def as_great(great, other_greats)
  other_greats.each do |og|
    great.outgoing(:as_great) << og
  end
end

So we will use this to our advantage and let fate (randomness) decide. We will randomly create “As Great” relationships between each great and up to 5 other greats, excluding themselves.

greats.each do |g|
  ogs = greats.select{|v| v != g }.sample(1 + rand(5))
  as_great(g, ogs)
end

Now that we have our 8 greats linked to each other we can determine who the greatest is. We do this by traversing the graph from each node, following our outgoing relationships and keeping track of which nodes we visit along the way.

def the_greatest
  neo = Neography::Rest.new
  neo.execute_script("m = [:];
                      c = 0;
                      g.
                        V.
                        out.
                        groupCount(m).
                        loop(2){c++ < 1000}.iterate();
                        
                        m.sort{a,b -> b.value <=> a.value}.keySet().name[0];")
end

puts "The greatest is #{the_greatest}"

Let’s break this down. We create a map “m” and a counter “c”.

m = [:];
c = 0;

Then from g (which if you remember my post about Gremlin is our graph). We are going to start at each of our vertices (or nodes).

g.
V.

and follow the outgoing relationships.

out.

the .out step emits a node, which we stuff into m, and every time we see that node again, we increment m with the groupCount method.

groupCount(m).

The loop(2) means we want to loop over the last two steps ( .out.groupCount(m) ) and the closure following it {c++ < 1000} means we will keep looping until our counter reaches 1000. Finally we iterate() until its done.

loop(2){c++ < 1000}.iterate();

At this point “m” is going to look like this:

==> v[5]=304
==> v[8]=304
==> v[2]=415
==> v[6]=757
==> v[7]=284
==> v[1]=198
==> v[4]=260
==> v[3]=431

Which represents the vertices and how many times we’ve traversed over them. What we want is to get the node we see the most, so we order them in descending order, and get the name of the first element. Don’t let the keySet method bother you, it just turns the keys of the map ( the vertices) into a set from which we get the name properties and grab just the first one.

m.sort{a,b -> b.value <=> a.value}.keySet().name[0];")

So finally we run :

puts "The greatest is #{the_greatest}"

and the answer is… Muhammad Ali. Yup, he is the greatest, confirmed. If you try this at home, you might find Alexander or maybe the “Greatest Show on Earth” turned up to be the greatest. It is in the hands of rand.

Centrality

So that’s fantastic, but why did we want to do this? Well, have you ever wondered who among your Facebook friends is the biggest social butterfly? Who is the most influential of your twitter followers? If the nodes were roads or places, then maybe which are the most traveled or the most used?

These are measures of Centrality which tell you how important a node is in a graph. Now you have what you need to build your own version of Klout or answer important questions about the data in your projects.

Tagged , , , , ,

7 thoughts on “Who is the Greatest?

  1. Anonymous says:

    What tool did you use to render the graph visually (a.k.a great_graph.png) ?

  2. Ben says:

    did you know about this? http://neo4j-challenge.herokuapp.com/ might want to enter!

  3. maxdemarzi says:

    @Mandi

    If you only want to pass through nodes with some property of a certain value you can use .filter, for example people over 190 cm:

    .out.filter{it.getProperty(‘height’)> 190}

  4. […] may remember my previous post on Centrality which showed you how to use Gremlin to find out who the Greatest was? If we try to use the same […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: