Neo4j Geospatial Queries

When I was growing up, the Neo Geo was the high end gaming system around. It was however prohibitively expensive for most people… and definitely out of my price range. I grew up in a mobile home park in Union City, CA. Near the old drive in theater now long gone. It was also next to an industrial park in Hayward where a food truck would make the best burritos $3 could buy. A search for the best burritos in Union City would have missed this food truck gem. Geographic boundaries can be a problem when searching for things by specific places. To get around this problem, we tend to use latitude and longitude and then perform a radius or bounding box search. Today I want to present to you a hybrid approach using Neo4j.

A while back I showed you how to import the Cities database from MaxMind using an extension and then once more using a Neo4j Stored Procedure. That model if you recall looked like this:

What you would do with this model is create a relationship between a User or Restaurant or whatever you wanted to keep track of at a “general” location. For example it would work great for finding restaurants, jobs or someone to date in your city. But we still have the problem of discrete boundaries and missing out on that $3 burrito. As I mentioned recently, we are adding spatial capabilities to Neo4j 3.4, so let’s update our model by adding latitude and longitudes to our Location nodes by importing the ip4 csv file. So go ahead and checkout the existing stored procedure and follow along with the instructions to build it, add it to the plugins folder, configure it to run and restart Neo4j.

Next you will need to grab the MaxMind GeoLite2 City (csv) zip file and extract the contents. Then we can use the stored procedure to create our schema:

 
CALL com.maxdemarzi.schema.generate;

Then import the location nodes:

 
CALL com.maxdemarzi.import.locations("/Users/maxdemarzi/Projects/import_maxmind_sproc/src/main/resources/data/GeoLite2-City-Locations-en.csv");

…and add the lat/lon data to the cities:

 
CALL com.maxdemarzi.import.ip4("/Users/maxdemarzi/Projects/import_maxmind_sproc/src/main/resources/data/GeoLite2-City-Blocks-IPv4.csv");

At the end of our imports, the Cities look like this:

Now we can add spatial capabilities. We start by creating an index on the currently undefined “location” property of the City nodes.

 
CREATE INDEX ON :City(location);

Then we set the location property of all cities to be a point made from the latitude and longitude and crs (Coordinate Reference System):

 
MATCH (c:City)
SET c.location = point({latitude: c.latitude, longitude: c.longitude, crs: 'WGS-84'});

Now let’s find Union City, CA and the neighboring cities within 10 kilometers:

 
MATCH (c:City)-[:IN_LOCATION]->(s:State), (c2:City)
WHERE c.name = "Union City"
  AND s.name = "California"
  AND distance(c.location, c2.location) <= 10000
RETURN c2.name, c2.latitude, c2.longitude, distance(c.location, c2.location)

We get:

So let’s try adding our food truck to Hayward:

 
CREATE (truck:Vendor {name:"Burrito Truck"})-[:SELLS]->(burrito:Food {name:"Burrito"})
WITH truck
MATCH (c:City)-[:IN_LOCATION]->(s:State)
WHERE c.name = "Hayward"
  AND s.name = "California"
CREATE (truck)-[:IN]->(c)

…and let’s see if we can find it again.

 

MATCH (c:City)-[:IN_LOCATION]->(s:State), (c2:City)<-[:IN]-(v:Vendor)-[:SELLS]->(f:Food {name:"Burrito"})
WHERE c.name = "Union City"
  AND s.name = "California"
  AND distance(c.location, c2.location) <= 10000
RETURN c2.name, v.name

Awesome!

So instead of having a latitude/longitude for every Restaurant, Job, Person, etc in the graph, I connect them to their City (or could be zip or neighborhood) and then use that in my geospatial queries. In some cases, this approach is better for privacy reasons, in others, we just want to avoid storing millions of lat/lon points if our use case doesn’t need distance to be too specific. Anyway, enjoy, and remember this is a 3.4 feature so grab the pre-release until it goes live in 2018 Q2 (sometime after Cinco de Mayo I hear).

Tagged , , , , , , , ,

7 thoughts on “Neo4j Geospatial Queries

  1. […] also demos another new feature of Neo4j 3.4 – geo-spatial indexes. In his blog post, he describes how to use them to find the right type of food place for your tastes via the […]

  2. Rashid Z. Muhammad says:

    Very useful! Any idea whether we may get a native contains() function that lets us query for points within polygons anytime soon?

    • craigtaverner says:

      We had hoped to add contains already in this release, but it was not possible to fit it in. So much more infrastructure is required to add the concept of ‘polygons’. Perhaps we get a chance in the next release, we’ll have to see.

      For now, however, there is the ability to do contains within a bounding box. You just use the normal Cypher range query where the two values are the bottom left an top right of the box:

      MATCH (p:Point)
      WHERE p.location < point({x:5,y:6}) AND p.location > point({x:4,y:3})
      RETURN p

      So if you calculate the bounding box in your application around your polygon, then call the above query, you will get a candidate set of points, and you can post-filter them in the app. The above query will be supported by the index, so should be fast even on larger datasets.

  3. Francisco says:

    Nice post. Looking forward new geospatial capabilities for the next release!

  4. wefreema says:

    Came for the awesome burrito picture, stayed for the Geospatial. :)

  5. […] and query using our distance function or you query within a bounding box. Max DeMarzi blogged about this as did Neo4j. At GraphConnect 2018, Will Lyon and Craig Taverner had a session on Neo4j and Going […]

  6. John says:

    Just started learning neo4j and this is something I really needed! One thing I wanted to add was the use of all cities in a specific metro code and this procedure created duplicate metro codes for each city in the US (roughly 20k additional nodes when there are 210 distinct metros in the MaxMind db).

    I created this cypher query to make the changes (which worked for me but would like to know if there’s a better way):

    MATCH (mc:Metro) WITH collect(DISTINCT mc.code) as metrolist
    UNWIND metrolist AS metronumber WITH metronumber
    MATCH (m:Metro {code: metronumber}) WITH metronumber, collect(m) as nodes
    WHERE size(nodes) > 1
    call apoc.refactor.mergeNodes(nodes, { properties: “discard”, mergeRels: true }) yield node
    RETURN metronumber, node

    Thanks for all these great posts!

Leave a comment