Let’s build something Outrageous – Part 5: Properties

Whatever your views on the game Monopoly, you play by getting properties, changing them and making sure anybody that uses them pays a tidy sum. That’s also true of graph databases. Finding a property, changing a property, filtering on a property and sometimes even retrieving properties can be really expensive. Part of the issue is that it was decided at some point that it would be great if any nodes of the same label could have different properties and if they had the same property keys, they didn’t necessarily have to have the same property types. This mean one Person node could have a height property and the other not. One node’s height property could be an integer representing inches or centimeters while another could use a float and another just write out “170 cm” or ” 5 foot 6″ as strings. Dealing with properties was completely left to the developer. Many times, there isn’t even an option to enforce property types.

This “schema less/optional” feature continues to cause major performance issues for graph databases and prevents many use cases from being efficiently executed. Just like the two way door of multiple labels that we walked through, we are going to backtrack on dynamic properties and property types. Every Node and Relationship of each Type will have the same set of Properties and they will be of the same Type in RageDB. In practice this is already the case for most production deployments, so we’re just making it formal and taking advantage of the performance gains. Each Node or Relationship Type will have their own Properties. At this point we will only allow 8 types of properties: booleans, integers, doubles, strings and lists of each. We can add smaller integer types, floats, and other things in the future.

    class Properties {
    private:
        tsl::sparse_map<std::string, uint8_t> types;
        tsl::sparse_map<std::string, uint8_t> type_map;
        std::vector<std::string> allowed_types;
        tsl::sparse_map<std::string, std::vector<bool>> booleans;
        tsl::sparse_map<std::string, std::vector<int64_t>> integers;
        tsl::sparse_map<std::string, std::vector<double>> doubles;
        tsl::sparse_map<std::string, std::vector<std::string>> strings;
        tsl::sparse_map<std::string, std::vector<std::vector<bool>>> booleans_list;
        tsl::sparse_map<std::string, std::vector<std::vector<int64_t>>> integers_list;
        tsl::sparse_map<std::string, std::vector<std::vector<double>>> doubles_list;
        tsl::sparse_map<std::string, std::vector<std::vector<std::string>>> strings_list;

If there is a smarter way to do this, I am all ears, so please chime in. We’ll have a list of the allowed types, leaving the blank type as an error marker. We don’t want any node or relationship having an empty property key, so we’ll start off types with that empty property and id type of zero which is also an error marker. Finally we’ll populate a type map with 1-8 for each type:

        allowed_types = {"", "boolean", "integer", "double", "string", "boolean_list", 
                         "integer_list", "double_list", "string_list"};

        types.insert({"", 0});

        type_map = {
                {"boolean",      boolean_type},
                {"integer",      integer_type},
                {"double",       double_type},
                {"string",       string_type},
                {"boolean_list", boolean_list_type},
                {"integer_list", integer_list_type},
                {"double_list",  double_list_type},
                {"string_list", string_list_type}
        };

With these things in place when we want to get the Schema of any node ore relationship type we can iterate through the types that have been set, convert their ids into a string and return a map of them after deleting the blank key.

    std::map<std::string, std::string> Properties::getPropertyTypes() {
        std::map<std::string, std::string> map;
        for (auto [type, type_id]: types) {
            map.insert({type, allowed_types[type_id]});
        }
        map.erase("");
        return map;
    }

So how do we set for example the “name” property to be a “string”? Well first we need to figure out what type we want, so we look it up in the type_map we initialized above.

    uint8_t Properties::setPropertyType(const std::string &key, const std::string &type) {
        // What type do we want?
        uint8_t type_id = 0;
        auto type_search = type_map.find(type);
        if (type_search != type_map.end()) {
            type_id = type_map[type];
        }

If we selected a bad type we end up with type_id = 0, and fail out.

        if (type_id == 0) {
            return 0;
        }

Next we check to see if this property and type already exist. If so we just return it, if not we fail out:

        // See if we already have it set
        if (types.find(key) != types.end()) {
            if (types[key] == type_id) {
                return type_id;
            }
            // Type already exists and it is not what we asked for
            return 0;
        }

At this point, it’s a valid type and it doesn’t already exist so let’s add it, add an entry to the property type vector of the corresponding type and we’re done :

        types.emplace(key, type_id);

        addPropertyTypeVectors(key, type_id);

        return type_id;
    }

In case you are wondering what addPropertyTypeVectors does, it’s just a switch statement. Here is an excerpt:

            case string_type: {
                strings.emplace(key, std::vector<std::string>());
                break;
            }

If you want to see the Properties.h and Properties.cpp in all their glorious splendor, go back and click on them.

Let’s try this out. First we’ll create a User type, then add a name:string and age:integer property types to it:

Excellent, those all seem to work just fine. It would be nicer however if I could create the Type and add the properties in a single command. We’ll add it to the TODO list. Now let’s create a node with a name and age:

…and we get:

What if we try to GET the node now?

We get the exact same thing back! You may be surprised to know it took like 1000 lines of code to get this part working. I can’t show it all here or you’d be bored to tears. I have many more lines of code to go still, but if you want to help me code, add some tests, refactor or comment things, correct my many mistakes, or just say hello then you can do so on the RageDB Slack while we wait for the next blog post.

Max De Marzi

Graphs, Graphs, and nothing but the Graphs