In this blog post, KuzuDB creator Semih Salihoğlu makes the case that graph databases need new join algorithms. If you’ve read the blog post and came away still a bit confused then look at the image above. This image shows what happens when you try to join 3 tables. The problem is that traditionally databases have used binary joins (two tables at a time) to execute queries. The intermediate result build up of these joins can get massive and eat a ton of memory and processing power. The more binary joins you have, the worse it gets.
In Cypher, we call any unbounded star query a “Death Star” query. You’ll recognize it if you see a star between two brackets in any part of the query:
the deadly pattern of a death star query
The “star” in Cypher means “keep going”, and when it is not bound by a path length -[*..3]- or relationship type(s) -[:KNOWS|FRIENDS*]- it tends to blow up Alderaaning servers. It’s hard to find a valid reason for this query, but its less deadly cousins are very important in graph workloads.
For example when looking at fraud, we may start with a Customer node and ask, which known Fraudulent nodes are within 4 hops away? A Customer HAS an Account that was ACCESSED by a Device that ACCESSED another Account that BELONGS_TO a known Fraudster. A Customer HAS a mailing Address that is very SIMILAR to an Address that BELONGS_TO a Business that is partially OWNED by a known Fraudster. These are just two out of many valid patterns in our graph. Graph databases were designed to handle these kind of queries. The trick is thatevery node KNOWS its relationships, every node KNOWS how it is connected.
I was watching The Marvelous Mrs. Maisel and one of the more jarring issues of the first two episodes is that Midge keeps getting arrested for the things she says. It reminded me of the song “Me So Horny” from 2 Live Crew that landed the hip hop group in jail charged with obscenity. Record store owners were getting arrested for selling CDs to undercover cops. How insane does that all sound? But it strikes the point that language matters. The things we say and how we say them are powerful. They convey meaning and emotion, language can be pleasant or it can be foul.
I’ve written a ton of SQL and Cypher queries over the last 20 years…and I’ve rewritten those queries as stored procedures more times than I can count. The issues with the expressivity of the query language and the ability of the query optimizer to “do the right thing” have been around longer than my career. I’ve written about this problem before. I went so far as to completely give up. In RageDB I let the developer write the query in a programming language directly. Skipping the “middle man” and letting the user be the query optimizer. Because in the end… this is what always happens. Well almost always.
There was a show called “30 Days” where people would be inserted in to a lifestyle completely different from their beliefs to see what would happen. The idea wasn’t so much to change their mind, but to help them (and the viewer) understand a little bit of both sides. My experience with Declarative Query languages so far has led me to the belief that they were ultimately a performance dead end. I’ve always known more than the database, which allowed me to hand-craft high performance queries using stored procedures. Building RageDB I decided to stay away from query languages, going as far as letting you write part of your query in “c” if you wanted to.