Max Flow with Gremlin and Transactions

The maximum flow problem was formulated by T.E. Harris as follows:

Consider a rail network connecting two cities by way of a number of intermediate cities, where each link of the network has a number assigned to it representing its capacity. Assuming a steady state condition, a nd a maximal flow from one given city to the other.

Back in the mid 1950s the US Military had an interest in finding out how much capacity the Soviet railway network had to move cargo from the Western Soviet Union to Eastern Europe. This lead to the Maximum Flow problem and the Ford–Fulkerson algorithm to solve it.

If you’ve been reading the Neo4j Gremlin Plugin documentation, you’ll remember it has a section on Flow algorithms with Gremlin. Let’s add a couple of things and bring this example to life.

We’re going to be modeling a simple railway system that needs to transport cargo from California to Illinois. A couple of direct routes exist, and additionally a route going through Texas. First step is to create our graph:

def create_graph
  neo = Neography::Rest.new
  graph_exists = neo.get_node_properties(1)
  return if graph_exists && graph_exists['name']

  states = [{:name => "California", :coordinates => [-119.355165,35.458606]},
            {:name => "Illinois",   :coordinates => [ -88.380238,41.278216]},
            {:name => "Texas",      :coordinates => [ -97.388631,30.943149]}]

  commands = states.map{ |n| [:create_node, n]}
  
  states.each_index.map do  |n| 
    commands << [:add_node_to_index, "states_index", "name", states[n][:name], "{#{n}}"] 
  end
  
  commands << [:create_relationship, "connected", "{#{0}}", "{#{1}}", {:capacity => 1}]    
  commands << [:create_relationship, "connected", "{#{0}}", "{#{1}}", {:capacity => 2}]
  commands << [:create_relationship, "connected", "{#{0}}", "{#{2}}", {:capacity => 1}]
  commands << [:create_relationship, "connected", "{#{2}}", "{#{1}}", {:capacity => 3}]

  batch_result = neo.batch *commands
end

You’ve seen me do this a few times already, so I won’t spend too much time on it. Just notice we’re adding the states names to an index, and using the Batch REST command to create it all at once. We’ll write our max_flow method next:

def max_flow
  neo = Neography::Rest.new
  neo.execute_script("source = g.idx('states_index')[[name:'California']].iterator().next(); 
                      sink = g.idx('states_index')[[name:'Illinois']].iterator().next();
                      
                      max_flow = 0;
                      g.setMaxBufferSize(0);
                      g.startTransaction();
                       
                      source.outE.inV.loop(2){
                        !it.object.equals(sink)}.
                      paths.each{
                        flow = it.capacity.min();
                        max_flow += flow;
                        it.findAll{
                          it.capacity}.each{
                            it.capacity -= flow}
                            };
                      g.stopTransaction(TransactionalGraph.Conclusion.FAILURE); 
                            
                      max_flow;")
end  

Let’s take a closer look at a few things. We use the index to look up our source (start) and sink (end) nodes, and use iterator().next() to get the first node from the Gremlin Groovy Pipeline returned by the index lookup. We also create a variable max_flow where our answer will go.

source = g.idx('states_index')[[name:'California']].iterator().next(); 
sink = g.idx('states_index')[[name:'Illinois']].iterator().next();
max_flow = 0;

We then set the transaction mode to manual by setting the MaxBufferSize to zero and start a new transaction. I’ll explain why in a minute.

g.setMaxBufferSize(0);
g.startTransaction();

From our source, we go to a neighboring node looping these two outE.inV steps until we reach the sink node.

source.outE.inV.loop(2){
!it.object.equals(sink)}.

For each path we find the lowest capacity along the edges we traversed using the min() function and add it to the max_flow variable we created earlier.

paths.each{
  flow = it.capacity.min();
  max_flow += flow;

Then we subtract the flow from the capacity property of each of the edges in our path. Take note we are actually altering data in this step.

it.findAll{
  it.capacity}.each{
    it.capacity -= flow}
};

At the end we return max_flow which has the answer to our question.

max_flow;

If you tried to run this method again, or tried to run a similar method using different sinks and sources that traveled over these nodes you’ll have a problem. The capacities were modified and will most likely be zero or show the residual capacity of the transportation network we built.

So to prevent this we stop the transaction with a Failure. The changes we made to capacity are not committed and the graph stays the way it was.

g.stopTransaction(TransactionalGraph.Conclusion.FAILURE); 

We can visualize this example using D3.js and its Geo Path projections:

As usual, all code is available on github. The max flow and related problems manifest in many ways. Water or sewage through underground pipes, passengers on a subway system, data through a network (the internet is just a series of tubes!), roads and highway planning, airline routes, even determining which sports teams have been eliminated from the playoffs.

Tagged , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,579 other followers

%d bloggers like this: