Flight Search with the Neo4j Traversal API

Screen Shot 2015-08-30 at 2.21.07 AM

Before Cypher came along, if you wanted to describe a graph traversal in Neo4j you would use the Traversal Framework Java API. The Traversal API is one of the many hidden gems of Neo4j and today we are going to take a closer look at it. Traversing a graph is about going on a journey. All journeys have a starting point (or points) so that’s the first thing we have to do, figure out where in the graph we begin. It can be a single node, or multiple ones, but they will go on the journey following the same rules, so its easier if it’s just one node or nodes of the same “type”.

Then we have to build a set of rules or description the node(s) will follow. The first decision is Order. Should a node follow a path all the way to the end as far as it can go (depth-first), or should it explore each path one step at a time(breadth-first). Think about it, you have a fork in the road, do you go all the way to the end of one path, or do you go one step in one path, go back, and then take one step in the other path, then two steps on the first path, then back and two steps on the second path, etc? You can tell that following one path to the end is going to be easier to do (less memory intensive) and in most cases it is the preferred Order.

The next decision you have to make is Uniqueness. Do you want to be able to visit nodes multiple times, relationships multiple times, do you not care at all? When using Cypher, you cannot match a pattern that goes over the same relationship multiple times, it enforces relationship uniqueness. Unless you are doing graph analytics, this is what you want to do most of the time.

Which relationship types should you traverse to get to your destination? That’s the role of Path Expanders. Back to our journey analogy, you may want to traverse down yellow brick roads, but not red brick ones. Maybe you want to take a walk on one relationship type for your first step, and a different relationship type your second step.

All journeys have an end, but not all are successful. The Evaluator decides if we reach a valid end to the traversal, or if we need to keep going, or if we’ve headed down a bad path and should just stop. It can also tell us when we’ve gone far enough and should not take any further steps.

What does this look like in code? Let’s take a look:

 
TraversalDescription myTraversal = db.traversalDescription()
        .depthFirst()
        .relationships( RelationshipTypes.KNOWS, Direction.INCOMING )
        .relationships( RelationshipTypes.LIKES, Direction.INCOMING )
        .evaluator( Evaluators.toDepth( 5 ) );

The equivalent in Cypher would be something like:

 
MATCH (n)<-[:KNOWS|LIKES*5]-()

But what if you allowed the KNOWS relationship to be traversed from either direction like the documentation example?

 
TraversalDescription myTraversal = db.traversalDescription()
        .depthFirst()
        .relationships( RelationshipTypes.KNOWS )
        .relationships( RelationshipTypes.LIKES, Direction.INCOMING )
        .evaluator( Evaluators.toDepth( 5 ) );

That would a lot messier to try to write in Cypher. Notice how we are stopping at depth 5? That’s a common evaluator that comes built in with the framework. There is a dozen or so to choose from and you can mix and match them. Notice how “.relationships” is called twice? You can use this helper method to build a PathExpander. Alternative you can use a PathExpanderBuilder to make some interesting ones like:

 
PathExpanderBuilder
        .allTypesAndDirections()
        .remove(RelationshipTypes.LIKES)
        .add(RelationshipTypes.LIKES, Direction.INCOMING) 

This would restrict the PathExpander to only follow Direction.INCOMING relationships for RelationshipTypes.LIKES while following any other relationship type in either direction. Try that in Cypher. You can also completely customize your Evaluators and Expanders by writing them yourself. Let’s see how we would apply what we’ve seen to build a flight scheduling engine in Neo4j.

Our service is going to take a list of starting airports, not just one because some metropolitan areas give passengers multiple airport options. In Chicago we have O’Hare and Midway. It is going to require a list of ending airports (for the same reason). It is going to require a flight date. A list of high priority airlines (for the frequent flyer and return trip use cases). Optionally it will take a limit on the number of records that it can return or the maximum time it can take to gather those results.

We have the model from the last blog post, so refer back to it if you haven’t seen it. So the first thing we want to do is create a custom evaluator that gathers the valid flight routes (represented as paths in our traversal).

It works by taking a list of potential destinations and a maximum length. It evaluates each step along the path and if it’s not the length we require it just excludes that path as too short and continues. If we end up at a node that isn’t an AirportDay, then we exclude it and continue. If we do end up at an AirportDay node, then we check to see if it is one of the destinations we want. If it is, then we include that path and go try another path. If not, we exclude this path and go try another one.

 
public class ReachedDestinationAtEvaluator implements Evaluator {
    private ArrayList<String> destinations;
    private int length;

    public ReachedDestinationAtEvaluator(ArrayList<String> destinations, int length) {
        this.destinations = destinations;
        this.length = length;
    }

    @Override
    public Evaluation evaluate(Path path) {
        if (path.length() < length) {
            return Evaluation.EXCLUDE_AND_CONTINUE;
        }

        Node lastNode = path.endNode();
        if (!lastNode.hasLabel(Labels.AirportDay)){
            return Evaluation.EXCLUDE_AND_CONTINUE;
        } else if (destinations.contains( ((String)lastNode.getProperty("key")).substring(0,3) )) {
            return Evaluation.INCLUDE_AND_PRUNE;
        }
        return Evaluation.EXCLUDE_AND_PRUNE;
    }
}

We will use a few expanders in our query. We create a NonStopExpander by implementing a PathExpander. The constructor takes the destinations we want to reach, the Airlines we are going to try and a stopTime to stop traversing if we run out of time. The method that describes how to traverse the graph is “expand”. We want to return a collection of relationships. One way to do this is to use the length of the path and use it to guide which relationships to return. If we hit a dead end, we return an empty list like so:

 
public class NonStopExpander implements PathExpander {
    protected ArrayList<String> destinations;
    protected RelationshipType[] relationshipTypes;
    protected long stopTime;

    public NonStopExpander(ArrayList<String> destinations,  RelationshipType[] relationshipTypes, long stopTime) {
        this.destinations = new ArrayList<>();
        this.relationshipTypes = new RelationshipType[0];
        this.stopTime = stopTime;
    }

    @Override
    public Iterable<Relationship> expand(Path path, BranchState state) {
        if (System.currentTimeMillis() < stopTime) {
            switch (path.length()) {
                case 0:
                    return path.endNode().getRelationships(Direction.OUTGOING, RelationshipTypes.HAS_DESTINATION);
                case 1: {
                    Node lastNode = path.endNode();
                    if (destinations.contains((String) lastNode.getProperty("code"))) {
                        return path.endNode().getRelationships(Direction.OUTGOING, relationshipTypes);
                    } else {
                        return Collections.emptyList();
                    }
                }
                 case 2:
                 case 3: {
                     return path.endNode().getRelationships(Direction.OUTGOING, relationshipTypes);
                 }
                default:
                    return Collections.emptyList();
            }
        } else {
            return Collections.emptyList();
        }
    }
}

I’ll put the picture of the model below so it’s easier to follow. At path.length “0” we are at our starting point, so we go traverse the HAS_DESTINATION relationships. Once there our path is now length 1, so we check the Destination code to see if it matches our destinations. If it doesn’t we give up, otherwise, we continue for two hops on our AirlineFlight relationships until we reach an AirportDay node and the Evaluator takes over from here.

flight type

So how do we get those AirlineFlight relationships? We’ll create a node labeled “Metadata” with a name property of “Airlines”. We’ll create a bunch of airlines and connect them to this “metanode”. We can then query the graph for the names of the AirlineFlight relationships we should traverse.

Screen Shot 2015-09-04 at 2.18.58 PM

The code is below. We would call this the first time a query is made and anytime an airline is added/deleted or updated from our system.

 
    public static void setFlightTypes(@Context GraphDatabaseService db) {
        Set<RelationshipType> flightTypes = new HashSet<>();
        try (Transaction tx = db.beginTx()) {
            Node airlines = db.findNode(Labels.Metadata, "name", "Airlines");
            if (airlines!=null) {
                for (Relationship r : airlines.getRelationships(Direction.INCOMING, RelationshipTypes.IS_AIRLINE)) {
                    final Node carrier = r.getStartNode();
                    final String flightType = carrier.getProperty("code") + "_FLIGHT";
                    flightTypes.add(DynamicRelationshipType.withName(flightType));
                }
            }
            Service.flightTypes = flightTypes;
        }
    }

To get the order on a per query basis, we create a ListOrderedSet and add the query high priority airlines first, then all of the rest. Duplicated entries won’t change the order.

 
        // Add High Priority Airlines to be checked first, and then add the rest
        Set<RelationshipType> orderedFlightTypes = new ListOrderedSet<>();
        for ( String airline : (ArrayList<String>)input.get("airlines")) {
            orderedFlightTypes.add(DynamicRelationshipType.withName(airline + "_FLIGHT"));
        }

        // Adding full list of Airlines, duplicates do not affect original ordering
        orderedFlightTypes.addAll(flightTypes);

We can now create our expander:

 
        NonStopExpander nonStopExpander = new NonStopExpander(
                (ArrayList<String>)input.get("to"),
                flightTypes.toArray(new RelationshipType[flightTypes.size()]),
                (long)input.get("time_limit")
        );

…as well as our TraversalDescription:

 
TraversalDescription nonStopTraversalDescription = db.traversalDescription()
                            .depthFirst()
                            .expand(nonStopExpander)
                            .evaluator(reachedDestinationEvaluator)
                            .uniqueness(Uniqueness.RELATIONSHIP_PATH);

Finally we can call the “traverse” method of our TraversalDescription and add the flight properties to our results array which we convert to JSON and send back at the end.

 
            for (org.neo4j.graphdb.Path position : td.traverse(departureAirportDay)) {
                ArrayList<HashMap> flights = new ArrayList<>();
                Long distance = 0L;
                for (Node flight : position.nodes()) {
                    if (flight.hasLabel(Labels.Flight)) {
                        HashMap flightInfo = new HashMap();
                        for (String property : flight.getPropertyKeys()) {
                            flightInfo.put(property, flight.getProperty(property));
                        }
                        flights.add(flightInfo);
                        distance += ((Number) flight.getProperty("distance")).longValue();
                    }
                }

                    result.put("flights", flights);
                    result.put("distance", distance);
                    results.add(result);
            }

We do this a few more times for direct, one stop and two stop flights. I’ve split them up for my sanity, for performance reasons (remember it goes depth first) and to allow the query to limit the number of stops that are allowed on the itinerary. The complete source code is available on github as always. So make sure your hand luggage is securely stowed, your seats are upright, tray tables put away, your phones are in airplane mode, your seat-belts fastened and have a nice flight.

Tagged , , , , , , , , ,

5 thoughts on “Flight Search with the Neo4j Traversal API

  1. […] routes between them go via the planet Sihnon. If you recall a set of past blog posts modeling and traversing flights, you may remember we can do this in Neo4j by creating a Traversal Description. We can start at […]

  2. […] seconds to get the dump of all the paths and that’s just too long. Let’s switch to the Traversal API and see if it can do any better. We’ll create a url endpoint that takes the person’s […]

  3. Declan Hernon says:

    Fantastic article, I am relatively new to Neo4j, and about to try using the Traversal API for the first time. This article has given me great insight on the advantages of using the traversal API to tune queries – thank you!

  4. Girish says:

    Hi Max. Really nice article. Does the Traversal API in python (neo4jrestclient) offer all the options ?

  5. Girish says:

    Hi Max. Really Nice article. Does the traversal framework in python (neo4jrestclient) offer all the possibilities available in the Java API ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: