Let’s build something Outrageous – Part 12: Finding Things

Knowing yourself is the beginning of all wisdom

Aristotle

At some point in your life you will look at the mirror and not recognize the person staring back at you. It’s not only the way you look, but the choices you made and the results of those choices that will seem puzzling. You may decide right then and there to embark in the most important quest of your life. Finding yourself.

Nodes and Relationships don’t get the luxury of having mirrors, but we still need to find them every once in a while. So far all we are able to do is get a node by a label and key or use the id to get a node or relationship. This is great and covers a lot of use cases, but sometimes the primary id is not what we are looking for. We may be looking for all nodes of a type that share a property value equal to something, or greater/less than something else. How can we go about finding the answers to these types of queries in our graph?

Having never done this before, I don’t know. So we’re going to try something here and see what happens. We need to first start with the types of operations we are going to support. For now, this is our list:

    enum Operation {
            EQ,              // A == B
            NEQ,             // A != B
            GT,              // A  > B
            GTE,             // A >= B
            LT,              // A  < B 
            LTE,             // A <= B
            IS_NULL,         // A is Null
            STARTS_WITH,     // A starts with B
            CONTAINS,        // A contains B
            ENDS_WITH,       // A ends with B
            NOT_IS_NULL,     // A is not Null
            NOT_STARTS_WITH, // A does not start with B
            NOT_CONTAINS,    // A does not contain B
            NOT_ENDS_WITH,   // A does not end with B
            UNKNOWN
    };

We could add case insensitive options to the starts/ends/contains variants, but let’s just keep it simple for now. We’ll figure out how to find Nodes then we can “copy pasta” for the relationships. We’ll create a findNodes method that takes the type of node we are searching for, the name of the property we care about, the operation we are trying to perform, the value we are comparing against and a skip and limit so it can be iterated on. Looks a bit like this:

   std::vector<Node> NodeTypes::findNodes(uint16_t type_id, const std::string &property, Operation operation, std::any value, uint64_t skip, uint64_t limit) {
        std::vector<Node>  nodes;
        if (ValidTypeId(type_id)) {
            uint16_t property_type_id = properties[type_id].getPropertyTypeId(property);
            int current = 1;
            uint64_t max_id = key_to_node_id[type_id].size();

The easiest things to find are IS_NULL and NOT_IS_NULL since we don’t care what type of property it is, we don’t even have to read the property, we just need to know it exists. If you recall earlier posts we keep track of deleted nodes and deleted properties in Roaring Bitmaps, so we can leverage those to help us answer this operation. For example for IS_NULL, we start with a blank bitmap and we add the deleted properties bitmap to it, then we remove the deleted nodes since those don’t count. Whatever is left are node ids of the properties set to null, so we iterate through them and grab the ones we need. Once we have more than the skip+limit we are done.

// Start blank, union with deleted properties, remove deleted nodes
if(operation == Operation::IS_NULL) {
    roaring::Roaring64Map blank;
    blank |= properties[type_id].getDeletedMap(property);
    blank -= getDeletedMap(type_id);
    for (roaring::Roaring64MapSetBitForwardIterator iterator = blank.begin(); iterator != blank.end(); ++iterator) {
        if (current > (skip + limit)) {
            break;
        }
        if (current++ > skip ) {
            nodes.emplace_back(getNode(type_id, *iterator));
        }
    }
    return nodes;
}

Finding properties that are NOT_IS_NULL follows a similar idea. We start with a full bitmap from 0 to max_id, remove the deleted nodes, then remove the deleted properties. Whatever is left has a valid value for that property.

Let’s now move on to the harder ones and try to find a node with a property equal to something. Since we don’t have any indexes we’re going to have to brute force it by looking at every property. We can do so one at a time or in batches. Lets try one at a time first and see where that gets us, we can implement the batching later (or maybe you can, I’d love some help!). We are going to iterate from zero to the max id, return once we get all the nodes we need, skip any deleted nodes and skip any deleted properties.

for (uint64_t internal_id=0; internal_id < max_id; ++internal_id) {
    // If we reached out limit, return
    if (current > (skip + limit)) {
        return nodes;
    }
    // If the node has been deleted, ignore it
    if (deleted_ids[type_id].contains(internal_id)) {
        continue;
    }

    // If the property has been deleted, ignore it
    if (properties[type_id].isDeleted(property, internal_id)) {
        continue;
    }

Before we go much further, we need an Expression class that will evaluate the operation we are trying to perform, the property of the node (a) and the property we are comparing against (b). As much as I love copy and paste, I don’t want to have to do this a billion times, so I’m actually going to use a template for our Evaluate method:

    class Expression {
    public:

        template <typename T>
        static bool Evaluate(Operation operation, T a, T b) {
            switch(operation) {
                case Operation::EQ: {
                    return EQ(a, b);
                }
                case Operation::NEQ: {
                    return NEQ(a, b);
                }
                case Operation::GT: {
                    return GT(a, b);
                }
...

…and more templates for our comparisons:

        template <typename T>
        static bool EQ(T a, T b) {
            return a == b;
        }

        template <typename T>
        static bool NEQ(T a, T b) {
            return a != b;
        }

        template <typename T>
        static bool GT(T a, T b) {
            return a > b;
        }

Look out world, I’m a real C++ programmer now. To use these we need to call them in our loop. For example when looking at an Integer property to compare, we first make sure we are comparing to an integer value, then we pass in “int64_t” for our class to our Evaluate method in Expression and pass in the operation, the property of the node and the value casted from any to int64_t:

case Properties::getIntegerPropertyType(): {
    if (Properties::isIntegerProperty(value)) {
        if (!Expression::Evaluate<int64_t>(operation, properties[type_id].getIntegerProperty(property, internal_id), std::any_cast<int64_t>(value))) {
            continue;
        }
    }
    break;
}

If the evaluation of the expression is not true, we ignore it and continue on to the next value. If it does turn out to be true, we break out of the switch statement, add our valid node and increment our current count:

if (current > skip) {
    nodes.emplace_back(getNode(type_id, internal_id));
}
current++;

Alright, let’s test this out in our gatling based benchmark by creating one million nodes with a number property between 1 and 10.

"number" -> usFaker.number().numberBetween(1,100000),

Then once that’s done, we’ll go find them. We should have about 10 nodes with each number property, so to make this a hard test, we’re going to return 10 but skip the first 5. That will force the database to go through all 1 million node properties.

  val fakeFeeder: Iterator[Map[String, Any]] = Iterator.continually(
    Map(
      "number" -> usFaker.number().numberBetween(1,100000)
    )
  )
  ...
  def createPostRequest: HttpRequestBuilder = {
    http("FindNodeFakeProperties")
      .post("/db/" + params.rageDB + "/nodes/Address/number/EQ?limit=10&skip=5")
      .body(StringBody(   """${number}"""))
      .asJson
      .check(status.is(200))
  }

We run our test and here are the results:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       1948 (OK=1948   KO=0     )
> min response time                                    278 (OK=278    KO=-     )
> max response time                                   3108 (OK=3108   KO=-     )
> mean response time                                   992 (OK=992    KO=-     )
> std deviation                                        335 (OK=335    KO=-     )
> response time 50th percentile                        995 (OK=995    KO=-     )
> response time 75th percentile                       1178 (OK=1178   KO=-     )
> response time 95th percentile                       1585 (OK=1585   KO=-     )
> response time 99th percentile                       1853 (OK=1853   KO=-     )
> mean requests/sec                                 31.934 (OK=31.934 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                           518 ( 27%)
> 800 ms < t < 1200 ms                                 989 ( 51%)
> t > 1200 ms                                          441 ( 23%)
> failed                                                 0 (  0%)
================================================================================

About 32 requests per second. It takes us about 1 second to get our answer and our max is 3 seconds. That’s not a bad start, but I think we can do better. One of the things we’re doing is calling getIntegerProperty in our loop:

properties[type_id].getIntegerProperty(property, internal_id)

This method is doing a bunch of check work which make sense when being called randomly but not from this internal search method.

    int64_t Properties::getIntegerProperty(const std::string& key, uint64_t index) {
        auto search = integers.find(key);

        if (search != integers.end()) {
            if (index < integers[key].size() ) {
                return integers[key][index];
            }
        }
        return tombstone_int;
    }

I’m going to make “integers” public and trade it for:

properties[type_id].integers[property][internal_id]

Lets recompile, recreate our nodes and try the benchmark again:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       2840 (OK=2840   KO=0     )
> min response time                                    169 (OK=169    KO=-     )
> max response time                                   2250 (OK=2250   KO=-     )
> mean response time                                   677 (OK=677    KO=-     )
> std deviation                                        333 (OK=333    KO=-     )
> response time 50th percentile                        636 (OK=636    KO=-     )
> response time 75th percentile                        849 (OK=849    KO=-     )
> response time 95th percentile                       1360 (OK=1360   KO=-     )
> response time 99th percentile                       1529 (OK=1529   KO=-     )
> mean requests/sec                                 46.557 (OK=46.557 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          2007 ( 71%)
> 800 ms < t < 1200 ms                                 592 ( 21%)
> t > 1200 ms                                          241 (  8%)
> failed                                                 0 (  0%)
================================================================================

Nice. We jumped from 32 to 46 queries per second, dropped our mean to about 2/3 of a second and our max is 2.2 seconds. But something else is bothering me about this query. For every single node I am checking if the value is a of a particular type and I am casting that value to the type we care about. But I should only ever have to do this once per loop, not every single time.

if (Properties::isIntegerProperty(value)) {
...
std::any_cast<int64_t>(value)

Let’s avoid that extra work. We’ll create a void pointer to hold our value, check and cast into it once and just reinterpret cast in the loop.

void* valuePointer;

switch (property_type_id) {
    ...
    case Properties::getIntegerPropertyType(): {
        if (Properties::isIntegerProperty(value)) {
valuePointer = std::any_cast<int64_t>(&value);
break;
        }
        return nodes;
    }
    ...
    switch (property_type_id) {
    ...     
        case Properties::getIntegerPropertyType(): {
if (!Expression::Evaluate<int64_t>(operation, properties[type_id].integers[property][internal_id], *reinterpret_cast<int64_t*>(valuePointer))) {
    continue;
}

We will recompile again and redo our benchmark and what do we get:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       3474 (OK=3474   KO=0     )
> min response time                                    139 (OK=139    KO=-     )
> max response time                                   2171 (OK=2171   KO=-     )
> mean response time                                   554 (OK=554    KO=-     )
> std deviation                                        276 (OK=276    KO=-     )
> response time 50th percentile                        450 (OK=450    KO=-     )
> response time 75th percentile                        660 (OK=660    KO=-     )
> response time 95th percentile                       1141 (OK=1141   KO=-     )
> response time 99th percentile                       1336 (OK=1336   KO=-     )
> mean requests/sec                                 56.951 (OK=56.951 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          2833 ( 82%)
> 800 ms < t < 1200 ms                                 530 ( 15%)
> t > 1200 ms                                          111 (  3%)
> failed                                                 0 (  0%)
================================================================================

Double Nice. We’ve gone from 32 to 57 requests per second, our mean time is down to about half a second and our max… well our max is still hanging out just over 2 seconds but that’s better than the 3 we started with. But there still things I don’t like. For example the way we access the values in the vectors. We have to get the map entry via the property string then fetch the index. Let’s get another point to just the vector and skip finding the property in the map every time:

properties[type_id].integers[property][internal_id]
vs
integerVector[internal_id]

…and one more time:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                       3925 (OK=3925   KO=0     )
> min response time                                    122 (OK=122    KO=-     )
> max response time                                   1703 (OK=1703   KO=-     )
> mean response time                                   490 (OK=490    KO=-     )
> std deviation                                        219 (OK=219    KO=-     )
> response time 50th percentile                        475 (OK=475    KO=-     )
> response time 75th percentile                        582 (OK=582    KO=-     )
> response time 95th percentile                        967 (OK=967    KO=-     )
> response time 99th percentile                       1152 (OK=1152   KO=-     )
> mean requests/sec                                 64.344 (OK=64.344 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                          3579 ( 91%)
> 800 ms < t < 1200 ms                                 324 (  8%)
> t > 1200 ms                                           22 (  1%)
> failed                                                 0 (  0%)
================================================================================

Got another 7 requests per second, cut our mean to under half a second and our max is now hovering around 1.7 seconds. But the code looks like a mess. I have two switch statements, pointers, too many ifs. Let’s try again and simplify this by first getting our deleted nodes and properties into a single roaring bitmap. Second by “type” we’ll any_cast the value into the right type. Third, we will get a reference to the vector of values we are going to be checking, and fourth we’ll iterate over this vector. So the code looks like this:

roaring::Roaring64Map blank;
blank |= properties[type_id].getDeletedMap(property);
blank |= getDeletedMap(type_id);
			
switch (property_type_id) {
   case Properties::getIntegerPropertyType(): {
        if (Properties::isIntegerProperty(value)) { 
    
            const int64_t integerValue = std::any_cast<int64_t>(value);
            const std::vector<int64_t> &vec = properties[type_id].integers[property];
            for(unsigned internal_id = 0; internal_id < vec.size(); ++internal_id) {
                // If we reached out limit, return
                if (current > (skip + limit)) {
                    return nodes;
                }
                // If the node or property has been deleted, ignore it
                if (blank.contains(internal_id)) {
                    continue;
                }
                if (!Expression::Evaluate<int64_t>(operation, vec[internal_id], integerValue)) {
                    continue;
                }
                 // If it is true add it if we are over the skip, otherwise ignore it
                if (current > skip) {
                    nodes.emplace_back(getNode(type_id, internal_id));
                }

                 current++;
            }
        return nodes;
        }
    }						

At least it makes more sense. Let’s see what happens when we run our benchmarks:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      36694 (OK=36694  KO=0     )
> min response time                                     10 (OK=10     KO=-     )
> max response time                                    132 (OK=132    KO=-     )
> mean response time                                    52 (OK=52     KO=-     )
> std deviation                                         10 (OK=10     KO=-     )
> response time 50th percentile                         51 (OK=51     KO=-     )
> response time 75th percentile                         59 (OK=59     KO=-     )
> response time 95th percentile                         69 (OK=69     KO=-     )
> response time 99th percentile                         85 (OK=85     KO=-     )
> mean requests/sec                                601.541 (OK=601.541 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         36694 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)
================================================================================

What? It’s 10 times better. We’re now all the way up to 600 requests per second, our mean is down to 50ms and our max is under 150ms. That is sooooo much better. Ok, so it looks like I have no clue what I’m doing.

UPDATE: I didn’t like having to make the internal pieces of Properties public, so I cleaned it up some by adding reference functions. I was only working on int64_t but I finished the rest and on my latest runs it looks like we got another 20% bump in performance:

================================================================================
---- Global Information --------------------------------------------------------
> request count                                      44170 (OK=44170  KO=0     )
> min response time                                     13 (OK=13     KO=-     )
> max response time                                    134 (OK=134    KO=-     )
> mean response time                                    43 (OK=43     KO=-     )
> std deviation                                         13 (OK=13     KO=-     )
> response time 50th percentile                         41 (OK=41     KO=-     )
> response time 75th percentile                         49 (OK=49     KO=-     )
> response time 95th percentile                         69 (OK=68     KO=-     )
> response time 99th percentile                         81 (OK=81     KO=-     )
> mean requests/sec                                724.098 (OK=724.098 KO=-     )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms                                         44170 (100%)
> 800 ms < t < 1200 ms                                   0 (  0%)
> t > 1200 ms                                            0 (  0%)
> failed                                                 0 (  0%)
================================================================================

If any of you C++ folks like performance puzzles and can help make this faster, I’m all ears. Please join me on Slack.

In the mean time I’m going to investigate a way to vectorize it so we can check multiple node properties in batches instead of doing things one at a time. Until next time!

Tagged , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: