Extending MySQL with VillageSQL

One of the things that made me fall head over heals for Neo4j so many years ago was just how extensible it was. If the database engineering team was busy rebuilding the clustering feature for the third time and didn’t have time to take care of my feature requests… I could just add them myself. Not to Neo4j directly, no that would have been a horrible mess. Instead I could add any feature I wanted as an “Unmanaged Extension”. Later on they became Cypher Stored Procedures, but it was basically the same thing. You had access to the top level Java API that dealt with Nodes and Edges. You could use the Traversal API that dealt with Paths….and if you were feeling extra spicy that day you could go down to the Storage API that dealt with Cursors over raw bytes.

I had spent prior jobs working with Oracle and Microsoft SQL Server so I never had that kind of power and freedom before. Well, it took a long time, but that power has come to MySQL in the form of a change tracking fork called VillageSQL. There are already a bunch of extensions that add UUID, Network Address custom types, Cryptographic Functions, Multi-Dimensional Geometry as well as AI helpers. So of course I had to try it out. I decided to add an extension for one of my other great loves, the Roaring Bitmap data structure.

Continue reading
Tagged , , ,

Decision Graphs

About eight years ago, I wrote a little blog post on dynamic decision trees. If you have the time, go back and read that and the follow up posts on the subject. If not, I’ll summarize. Instead of building a Rules Engine to make decisions, I built a dynamic traverser that evaluates logic on the fly. This enables us to change how the decisions are made any time. We can also version the trees as we change it and create completely new trees for any complex decision making. If we don’t have enough information to get to a conclusion, it doesn’t give up or make something up, instead it asks us a question we must answer to continue down the path.

I bring up this relic from the past because I’m still trying to wrap my head around the value of “Context Graphs”. Which are supposed to be graphs that capture the how, when and why decisions were made instead of what decision was made. The idea being to capture “decision traces” which then allow an AI agent to see precedents (how similar problems were solved) and apply those same rules to new situations. The first decision traces would be made by people, and then the agents could learn and take it from there. So every time you want a decision answered, your agents would read all the previous decisions, analyze them, and make a new decision. But why? If we made a policy change, would the AI agents now ignore all the previous decisions and utilize the new policy? Would we have to repopulate the graph with human made decisions following the new policy so the agent understood there was a change?

Continue reading
Tagged , , , ,

Improving Performance with Flame Graphs

Yes, I’m slowly but surely getting on the generative AI bandwagon. The eye catching image above was generated in Lexica, it’s not perfect but our mind tricks us into accepting it. I am not a fan of asking these new AI systems questions and getting answers that only look like correct answers… but we’re not talking about that today. Instead we’ll be looking at improving the performance of RageDB using “perf” and “FlameGraphs“. Which really should have been called “FlameCharts” since it’s a chart not a graph but let’s not go there either.

Continue reading
Tagged , , , ,

Counting Triangles

Our last post was about a Triangle Count query. It referenced another blog post from Kuzu where they explained their use of a Multi Way Join Algorithm to count 41 million triangles in 1.62 seconds. Using only binary joins it would take them 51.17 seconds to achieve the same result. My attempts to run the query using Lua on Rage landed at 9.5 seconds one node at a time and 5.6 seconds using the batch queries. So that got me thinking, how about Neo4j?

Continue reading
Tagged , , , , ,

All of NoSQL is because of this…

In this blog post, KuzuDB creator Semih Salihoğlu makes the case that graph databases need new join algorithms. If you’ve read the blog post and came away still a bit confused then look at the image above. This image shows what happens when you try to join 3 tables. The problem is that traditionally databases have used binary joins (two tables at a time) to execute queries. The intermediate result build up of these joins can get massive and eat a ton of memory and processing power. The more binary joins you have, the worse it gets.

Continue reading
Tagged , , , ,

Bullshit Graph Database Performance Benchmarks

Hey HackerNews, let me just drop my mixtape, checkout my soundcloud and “Death Row” is the label that pays me.

How is the Graph Database category supposed to grow when vendors keep spouting off complete bullshit? I wrote a bit about the ridiculous benchmark Memgraph published last month hoping they would do the right thing and make an attempt at a real analysis. Instead these clowns put it on a banner on top of their home page. So let’s tear into it.

At first I considered replicating it using their own repository, but it’s about 2000 lines of Python and I don’t know Python. Worse still, the work is under a “Business Source License” which states:

Continue reading
Tagged , , , , , , ,

Death Star Queries in Graph Databases

Star Wars: Episode IV - A New Hope Death Star

In Cypher, we call any unbounded star query a “Death Star” query. You’ll recognize it if you see a star between two brackets in any part of the query:

-[*]-

the deadly pattern of a death star query

The “star” in Cypher means “keep going”, and when it is not bound by a path length -[*..3]- or relationship type(s) -[:KNOWS|FRIENDS*]- it tends to blow up Alderaaning servers. It’s hard to find a valid reason for this query, but its less deadly cousins are very important in graph workloads.

For example when looking at fraud, we may start with a Customer node and ask, which known Fraudulent nodes are within 4 hops away? A Customer HAS an Account that was ACCESSED by a Device that ACCESSED another Account that BELONGS_TO a known Fraudster. A Customer HAS a mailing Address that is very SIMILAR to an Address that BELONGS_TO a Business that is partially OWNED by a known Fraudster. These are just two out of many valid patterns in our graph. Graph databases were designed to handle these kind of queries. The trick is that every node KNOWS its relationships, every node KNOWS how it is connected.

Continue reading
Tagged , , , , , , ,

KHop Baby One More Time

Some of the most beloved songs by main stream artists were written by Max Martin. The song “Baby One More Time” came out in 1999 and sold over 10m copies. It propelled Britney Spears into pop stardom. If we were to look at the graph above, with Max Martin in the center, then one hop away are the songs he wrote which would become #1s on the Billboards Hot 100. Two hops away are the Artists that performed those #1 songs. It is beyond question that Max Martin knows how to write good pop songs. I wish I had his talent. I only know how to write half way decent KHop implementations.

Continue reading
Tagged , , , ,

Dark Mode

I tend to only find time to work on RageDB at night. Staring at code in CLion using the “Darcula” theme works great. But like a vampire exposed to direct sunlight, things go horribly wrong when I try to test what I am working on using the RageDB front-end. You see, besides the code-editor, the rest of the interface is very bright. Blindingly so:

Continue reading
Tagged , , , ,

Observations of the LDBC SNB Benchmark

RageDB Mascot running

I’m still trying to figure out the right look for the language folks will use to talk to RageDB. Instead of waiting until I have it figured out, I decided I should write all the queries for the LDBC SNB Benchmark to prepare for a full run in the next few months. Now that we added “stored procedures” to RageDB, the benchmark code is trivial. I send a post request to the Lua url with the name of the query plus any parameters it may need which are in a CSV file. Here is Short Query 4 for example and they all look like this besides the different parameters:

Continue reading
Tagged , , , , , ,