Query Optimizers are made of sand

I’ve written a ton of SQL and Cypher queries over the last 20 years…and I’ve rewritten those queries as stored procedures more times than I can count. The issues with the expressivity of the query language and the ability of the query optimizer to “do the right thing” have been around longer than my career. I’ve written about this problem before. I went so far as to completely give up. In RageDB I let the developer write the query in a programming language directly. Skipping the “middle man” and letting the user be the query optimizer. Because in the end… this is what always happens. Well almost always.

At the 15th LDBC Technical User Community event I ran into this slide deck from the folks at Umbra.

You may not know who they are, but they are the same folks that built Hyper, both of which were able to complete the LDBC Labeled Subgraph Query Benchmark while the other databases failed miserably.

They managed to do so by rewriting their queries until the got a 10x boost in performance. Since they are the people who literally wrote the actual database, they are the top experts of how to write the queries to achieve the highest “Mechanical Sympathy” and get the best query times.

The people who will end up being users of Umbra (or any database really) will on average never be as capable of this feat as the creators of the database. Think about it. 10x. The queries ran ten times faster by knowing the right incantations to type to this “declarative” language. That defeats the whole purpose. If you take the 10 minutes Hyper took and multiply it by 10, then you are at 100 minutes…. and over the 90 minute timeout in their test.

What’s the point of a declarative language if you have to be an expert in not only the “mechanics” of the database engine, but also how to get that database to translate the query language into those database engine instructions. It’s a two step process. Like eating ice cream from a double pendulum! Chaos! You’re gonna need lots of napkins.

Astute readers will recall I wrote “almost always”. There is one database engine out there, even more outrageous than RageDB. They have built a Semantic Optimizer far more capable than your standard run of the mill relational database query optimizers, and I believe they really have a shot. I believe it so much I joined the company.

Tagged , , , ,

Leave a comment