Composite Indexes in Neo4j 4.0

Neo4j 4.0 has officially launched and folks are very excited about some of the new features including multiple graphs and schema based security. But you should also be excited about some of the more “quality of life” improvements in the product. One we will talk about today is the upgrade that composite indexes received.

Just last month I showed you a modeling trick to get around the Neo4j 3.5 limitation that composite indexes could not speed up range queries. Let’s revisit that now since that limitation has been removed. Make sure you have the latest version of Neo4j Desktop installed and create a new Neo4j 4.0 Graph. Before you spin it up, go to the settings and increase the “dbms.memory.heap.max_size” to “8G” or “12G” in order to deal with these big transactions. Alright now remember the issue was finding the last few transactions for a particular user when you have users with millions of transactions (think businesses instead of people). Instead of trying to combine the node id of the user and the timestamp of the transaction into a single property, we’re just going to create these as normal. We’ll use the username and a datetime field for our properties.

FOREACH (r IN range(1,1000000) | 
   CREATE (:Transaction {username: "max", 
          date: (datetime() - duration('P' + ceil(1461 * rand()) + 'D'))} )
)

Let’s first try to get the last 25 transactions without creating any indexes:

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.date DESC
LIMIT 25

That query comes in at 911ms. That’s not very fast. Alright let’s add our index.

CREATE INDEX composite FOR (n:Transaction) ON ( n.username, n.date )

Give it a few seconds to populate. You can check to see if it’s done populating and online with this command:

CALL db.indexes()

Let’s try our query again.

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.date DESC
LIMIT 25

Cool! We’re down to 381ms, that sure beats 911ms but is that the best we can do?

Of course not. Remember when we created the composite index we used “username, date” but our query only uses “date” and it does it in a descending order. What happens if we change the query to match the index:

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.username, tx.date
LIMIT 25

2ms! Way way lower than before. But this gives us the wrong answer since we want the last 25 transactions in descending order. Let’s add this bit back to our query.

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.username, tx.date DESC
LIMIT 25

1719ms?!? What in the world just happened? We made it worse than the unindexed query, what did we do wrong? Oh… right we forgot to make the username go in descending order too in order to match the index (get the hint?):

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.username DESC, tx.date DESC
LIMIT 25

…and we’re back down to 2ms with the correct results!

Now, wait a minute. You may be thinking. The username is always going to be the same, why do we have to indicate an order at all, shouldn’t the Cypher query planner figure this out for us? Yes. The answer to that question is yes, and it will. But this is a .0 release which means it still has a few quirks and not all the new features are fully integrated into the product. You want to see something interesting? We can create another composite index with the order reversed.

CREATE INDEX composite_2 FOR (n:Transaction) ON ( n.date, n.username )

Now try the original query:

PROFILE MATCH (tx:Transaction) 
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.date DESC
LIMIT 25

…and it looks at the wrong index. Crap. Ok try using an INDEX HINT to tell it to use the second composite index we created:

PROFILE MATCH (tx:Transaction) 
USING INDEX tx:Transaction(date, username)
WHERE tx.date < datetime() AND tx.username = "max"
RETURN tx
ORDER BY tx.date DESC
LIMIT 25

Back in business at 2ms! So what’s the take away? Make your query match your index or your index match your query if you want to take full advantage of composite index range queries.

Tagged , , , , , , , ,

Leave a comment