Tag Archives: graph databases

Let’s build something Outrageous – Part 25: Dates in C++ and Faster Imports 

Back in February, we added the ability to load a CSV file and alter the contents while importing it. We also added Date support to RageDB using a Lua library. This was a masterful job of copy and paste and got us lots of functionality very quickly. When we timed the import for LDBC SNB SF10 it came in at 28 minutes. Which wasn’t bad, but wasn’t great. Let’s try to speed that up today.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 24: Permissions and Multiple Graphs

Typically we want to Reduce, Reuse and Recycle to help the environment. But today we are going to Reduce, Reuse and Recycle the Lua Sandbox Environment to give us two additional sets of permissions. The first is “Read Write” in which a user can read and write to the database but cannot create new types of nodes or relationships or data types. The second is “Read Only” which does what it sounds like.

While we’re here, we’re going to one graph, two graph, rage graph, blue graph our way to multi database support. Let’s jump in:

Continue reading
Tagged , , , , , ,

Let’s build something Outrageous – Part 23: Sandboxing

The idea of using a programing language as the way to write queries against the database makes many security folks hyperventilate. In order to lower their heart-rate and slow their breathing we have to limit the queries using a technique known as “sandboxing“. The Sol2 library we are using in RageDB lets us create an “environment” where our queries will run. Let’s see how we go about doing this.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 21: Overloading

Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte. I would have written a shorter letter, but I did not have the time. Written by Blaise Pascal is often misattributed to Mark Twain. It reminds us to try to be brief. Too many people never learn this.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 20: I can’t live without Roaring Bitmaps

Valentine’s Day was earlier this week, maybe you took your significant other to dinner, sent flowers or candy to your crush, even bought a card for that special someone. I bet however you didn’t profess your love to your favorite software library. I did. I love Roaring Bitmaps. Like Mariah Carey, I can’t live without it, so I won’t. I added Roaring Bitmaps to RageDB.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 19: LDBC Short Queries

The folks who build the database are not the same folks who use the database and that causes problems. It has been my number 1 complaint for the past decade or so. People building features in isolation can’t see the forest for the trees and the end user experience suffers. I ran into this video from Molham Aref where he puts it quite nicely:

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 18: Load CSV

As much as we all love graphs, the rest of the world hasn’t quite caught on yet. They are still sending CSV files to each other like some sort of cavemen. We have a few options for dealing with them. One is to convert them to a specific file format and bulk load them into the database as fast as possible. Another is to stream them one row at a time as-is and potentially do some transformations on the fly as needed and turn each row into one or more pieces of data. Today we’re going to go with option 2.

Continue reading
Tagged , , , ,

Let’s build something Outrageous – Part 17: Bulk Traversals

A few years ago I was really angry at the traversal performance I was getting on some slow query. So angry that I wrote a couple thousand lines of C code just to calm myself down. This is how Neo4c came about. This blog post explains the just of it. Neo4c was able to crank out 330 million traversals per second on a single core (because it was single threaded code) for a hand crafted “query” written in C using 32 bit ids (so limited to 4b nodes and 4b relationships). It wasn’t really a fair comparison to Neo4j, but it made me realize there was a lot of performance out there to be had. Let’s see where we are today.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 16: Adding a UI

For some crazy reason I thought I could get away with building RageDB without a UI. I was using the “Advanced REST Client” to test functionality, it has served me well for the last few months, but it’s time we added a few coats of paint to our database server and let users interact with the graph visually instead of by squinting at JSON outputs. As a primarily back-end dev, building a front end is a scary proposition. I am barely getting the hang of C++ (yeah right) and now I’m going to have to write a bunch of Javascript and Node and React… no.

Continue reading
Tagged , , , , ,

Let’s build something Outrageous – Part 15: Connected

Checking if two nodes are directly connected is something you often have to do in a graph. There are a few different ways to implement this feature depending on how the database keeps track of relationships. In Neo4j a double linked list of relationships is kept per node grouped by the relationship type in both the incoming and outgoing directions. To check if two nodes are directly connected, one has to traverse one of the lists (preferably the shortest one) and checking to see if the other node id is included in that list. If we don’t know the relationship type, we have to check all the groups (for dense nodes, or light nodes there are no groups and we check them all anyway).

In Amazon Neptune the SPOG index can be queried twice. Once with the first node in the S position and the second node in the O position, then again with the positions reversed (with the P position being the relationship type). If we don’t know the relationship type we can query the indexes twice per relationship type.

Checking if two nodes are directly connected is similar to checking for set membership, and one trick we could use is a bloom filter and variant data structures. Long time readers will remember this blog post outlining exactly how to do that and achieve 100x faster checks including a “double check” to get around the probabilistic nature of these data structures.

Continue reading
Tagged , , , , ,