Let’s build something Outrageous – Part 10: Nulls

We decided that our Nodes and Relationships will have Schema in our database. If we create a User Node Type and set an “age” property type as an Integer, then all User nodes will have that property as an Integer. This idea seem simple enough, but what happens if we truly don’t know the value of a property for a node? The user hasn’t told us their age, or for one of many perfectly valid reasons do not know what their age is? Or what if that age property that was once set is deleted? What do we do now?

In Part 5 of this series we talked about how we store properties, but not really how we delete them. In my first design I had “tombstone” (a value indicating this data is no longer here) by type. So for Strings it would be an empty string, for Integers it was the lowest negative value allowed, etc… but what do we do about Booleans? It’s either true or false, it doesn’t lend it self to a tombstone style value… and for the ones that do, does it even make sense?

Imagine I created a User node with a name of “max” and an age of 42. Then I deleted that age and request the node, I would see:

{ 
  "name": "max", 
  "age":-9223372036854775808 
}

That just looks plain ugly to me. So we have two options: Option one is we replace the value with a “null” which is valid JSON and let the user deal with it.

{ 
  "name": "max", 
  "age": null 
}

Option two is we remove “age” altogether from that result.

{
  "name": "max" 
}

I don’t know about you, but every time I see that null value I get SQL flashbacks and my mind recalls a video from Tony Hoare talking about the billion dollar mistake that was null references. So Option 2 it is. Let’s implement it.

You can see all the changes in one commit if you’d like, but let’s walk through them. First thing we need to do is add a map to keep track of which nodes or relationships of each type have a deleted property of that specific type. Remember that each node type and relationship type have their own Properties class, so following along with our example from above we would have a map of { “name”: Roaring64Map, “age”: Roaring64Map }.

class Properties {
...
tsl::sparse_map<std::string, Roaring64Map> deleted;
}

Now whenever we get a property or all the properties from a Node or Relationship we need to check them:

    std::any Properties::getProperty(const std::string& key, uint64_t index) {
        if (types.find(key) != types.end()) {
            if (!deleted[key].contains(index)) {
            ...

    std::map<std::string, std::any> Properties::getProperties(uint64_t index) {
        std::map<std::string, std::any> properties;
        for (auto const&[key, type_id] : types) {
            if (!deleted[key].contains(index)) {
            ...

For example for getProperties we build a temporary map of string: any, we go through all the property types this Node or Relationship type is supposed to have and if the entry has been deleted, then we skip adding it to the map. That’s pretty simple. If we changed our minds and wanted to go the null direction, we would return a std::any without a value instead.

Now when we set a property we need to make sure it’s not marked as deleted:

bool Properties::setIntegerProperty(const std::string &key, uint64_t index, int64_t value) {
...
   integers[key][index] = value;
   deleted[key].remove(index);

Ok, so when do we mark them as deleted? Well, whenever we delete the property or all the properties of course:

    bool Properties::deleteProperty(const std::string& key, uint64_t index) {
        if (types.find(key) != types.end()) {
            deleted[key].add(index);
            ...

    bool Properties::deleteProperties(uint64_t index) {
        for (auto[key, value] : types) {
            deleted[key].add(index);
        }

But what if they were never created at all? Then what? Well when we add new nodes or relationships to their corresponding types we would call deleteProperties with the new internal_id as our index to mark them all deleted and then unmark them deleted as they were set from the previous methods.

    bool NodeTypes::addId(uint16_t type_id, uint64_t internal_id) {
        if (ValidTypeId(type_id)) {
            deleted_ids[type_id].remove(internal_id);
            properties[type_id].deleteProperties(internal_id);
            ...

    bool RelationshipTypes::addId(uint16_t type_id, uint64_t internal_id) {
        if (ValidTypeId(type_id)) {
            deleted_ids[type_id].remove(internal_id);
            properties[type_id].deleteProperties(internal_id);

The only trick is that we have to do this BEFORE we set the properties from JSON:

relationship_types.addId(rel_type_id, internal_id);
relationship_types.setPropertiesFromJSON(rel_type_id, internal_id, properties);

Alright, let’s create the Schema for a User node with a name:string and an age:integer properties:

So we can see we have both properties. Now let’s delete the “age” property:

The “age” property should not be deleted, let’s go ahead and get the node User Max again:

… and there we have it. Just the “name” property comes back, the “age” property has disappeared from our JSON result. Now, it still lives in the database intact, we could theoretically add an “undelete” or “undo” which would unset the bit from our RoaringBitmap and we’d have that property back. Or we could replace it with a tombstone value if we wanted a real “deletion”. Anyway, I think this design works well when the great majority of the nodes or relationships of a type have all the properties defined in our Schema. If we’re looking at a sprinkle of them having a particular property and 99% not having it, then a different design would be better.

We could store “special” properties as a map<std::string, std::any> made up of node_id + property as a string key and value per type. But we’ll leave that feature off to another day.

On a completely different topic, I started working on the website for RageDB. Take a look at the image below or you can check out the current state by following the link. Its build on github pages with Jekyll and Bulma, so it is easy to modify. If you have the time to help out, please do so. The source code is available here, send me a pull request!

Don’t forget to join me on Slack… and follow the project on twitter under the username @rage_database. Until next time!

https://ragedb.com/
Tagged , , , , ,

Leave a comment