Let’s build something Outrageous – Part 6: Relationships

Good relationships are hard. They don’t just happen. They take time, patience and about two thousand lines of code to work together. I’m not convinced I have it right, maybe one of you out there has a better design we can implement. In the original design I was storing the full relationships complete with starting/ending node ids, properties and type information. This time Relationships are only temporarily created when requested, and we’re just going to store their pieces in different vectors. Let’s dive in to the code:

    class RelationshipTypes {
    private:
        std::unordered_map<std::string, uint16_t> type_to_id;
        std::vector<std::string> id_to_type;
        std::vector<std::vector<uint64_t>> starting_node_ids;
        std::vector<std::vector<uint64_t>> ending_node_ids;
        std::vector<Properties> properties;
        std::vector<Roaring64Map> deleted_ids;

We have a type to id map to store “KNOWS” and “LIKES” and convert them to 1 and 2. As well as the id_to_type for the reverse. A vector of starting node ids and ending node ids. This could have been a single vector of Links instead, but I had named the members node_id and rel_id so it would be weird to reuse them for two node ids and then if I just named them first and second, I would have to remember that they meant different things so forget it. I could have created a new class of Ids I guess. Software development is having to lots of decisions and hoping none of them completely screw you up.

We’ll start from the API again and walk our way backwards. Let’s see how we go about creating a new Relationship. We have two HTTP endpoints, one with node ids and one with types and keys. Let’s see the one with types and keys:

:POST /db/{graph}/node/{type_1}/{key_1}/relationship/{type_2}/{key_2}/{rel_type}
JSON formatted Body: {properties}

First thing we have to do is validate all our parameters are not blank:

future<std::unique_ptr<reply>> Relationships::PostRelationshipHandler::handle([[maybe_unused]] const sstring &path, std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
    bool valid_type = Utilities::validate_parameter(Utilities::TYPE, req, rep, "Invalid type");
    bool valid_key = Utilities::validate_parameter(Utilities::KEY, req, rep, "Invalid key");
    bool valid_type2 = Utilities::validate_parameter(Utilities::TYPE2, req, rep, "Invalid type2");
    bool valid_key2 = Utilities::validate_parameter(Utilities::KEY2, req, rep, "Invalid key2");
    bool valid_rel_type = Utilities::validate_parameter(Utilities::REL_TYPE, req, rep, "Invalid relationship type");

    if(valid_type && valid_key && valid_type2 && valid_key2 && valid_rel_type) {

I didn’t show you the validate_parameter method last time, so let’s see it here:

bool Utilities::validate_parameter(const sstring &parameter, std::unique_ptr<request> &req, std::unique_ptr<reply> &rep, std::string message) {
    bool valid_type = req->param.exists(parameter);
    if (!valid_type) {
        rep->write_body("json", json::stream_object(std::move(message)));
        rep->set_status(reply::status_type::bad_request);
    }
    return valid_type;
}

What I like about it is that it sets the bad request and returns the custom error message for us automatically so we don’t have to remember to deal with that later. But back to the creation of a relationship. If the relationship is empty, we kick off the addEmptyPeered method on this shard handling the request and it will deal with the routing to the right place(s).

        if (req->content.empty()) {
            return parent.graph.shard.local().RelationshipAddEmptyPeered(req->param[Utilities::REL_TYPE], req->param[Utilities::TYPE],req->param[Utilities::KEY], req->param[Utilities::TYPE2], req->param[Utilities::KEY2])
                    .then([rep = std::move(rep), rel_type=req->param[Utilities::REL_TYPE], this] (uint64_t id) mutable {
                        if (id > 0) {
                            return parent.graph.shard.local().RelationshipGetPeered(id).then([rep = std::move(rep), rel_type] (Relationship relationship) mutable {
                                rep->write_body("json", json::stream_object((relationship_json(relationship))));
                                rep->set_status(reply::status_type::created);
                                return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
                            });
                        } else {
                            rep->write_body("json", json::stream_object("Invalid Request"));
                            rep->set_status(reply::status_type::bad_request);
                        }
                        return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
                    });

If the relationship is not empty we need to validate the JSON is valid because I don’t want to get half way into creating a relationship and then have to back-out because there was something malformed in the input. But it is otherwise the same.

bool Utilities::validate_json(const std::unique_ptr<request> &req, std::unique_ptr<reply> &rep) {
    simdjson::dom::object object;
    simdjson::error_code error = parser.parse(req->content).get(object);
    if (error) {
        rep->write_body("json", json::stream_object("Invalid JSON"));
        rep->set_status(reply::status_type::bad_request);
    }
    return !error;
}

The issue with this is that I’m parsing the JSON twice. Once for validation before I create the relationship and once again as I create the relationship. Simdjson is blazing fast, so maybe it’s not an issue, but we’ll make sure to test commenting that validation out to see if there is much of a difference in performance, and if there is, we’ll find a way to just do it just once and pass the results. Now let’s dive in to actually creating the relationship:

    seastar::future<uint64_t> Shard::RelationshipAddPeered(const std::string &rel_type, const std::string &type1, const std::string &key1,
                                                           const std::string &type2, const std::string &key2, const std::string& properties) {
        uint16_t shard_id1 = CalculateShardId(type1, key1);
        uint16_t shard_id2 = CalculateShardId(type2, key2);
        uint16_t rel_type_id = relationship_types.getTypeId(rel_type);

We start off by getting the shard ids of the two nodes involved in the relationship and the relationship type id. Remember the shard that received this request may not even be involved here, so we just need to pass the work along. Our best case scenario is the relationship type already exists and the two nodes in this relationship belong to the same shard. In this case we have a special method to deal with it.

        if (rel_type_id > 0) {
            if(shard_id1 == shard_id2) {
                return container().invoke_on(shard_id1, [rel_type_id, type1, key1, type2, key2, properties](Shard &local_shard) {
                    return local_shard.RelationshipAddSameShard(rel_type_id, type1, key1, type2, key2, properties);
                });
            }

If the two nodes are not in the same shard, we are going to find the id of node 2 on its shard:

return container().invoke_on(shard_id2, [type1, key1, type2, key2](Shard &local_shard) {
                return local_shard.NodeGetID(type2, key2);
            }).then([shard_id1, shard_id2, rel_type_id, type1, key1, this] (uint64_t id2) {

If it’s valid (greater than zero) then we need to go to the shard of node 1 and get its id:

if (id2 > 0) {
   return container()
      .invoke_on(shard_id1,[rel_type_id, type1, key1, id2](Shard &local_shard) {
                        std::vector<uint64_t> ids;
                        uint64_t id1 = local_shard.NodeGetID(type1, key1);

We need to keep track of the node id of the first node, but we also need to try to create the outgoing part of the relationship and keep track of its id, so we’ll create a vector to hold these ids. If the id of node 1 is valid (greater than zero) then we add the relationship and put both the node id and the relationship id on the ids vector.

                        if (id1 > 0) {
                            uint64_t rel_id = local_shard.RelationshipAddEmptyToOutgoing(
                                    rel_type_id, id1, id2);
                            if (rel_id > 0) {
                                ids.push_back(id1);
                                ids.push_back(rel_id);
                            }
                        }

Returning the vector with the two ids, we can check to see if it’s empty which means we had a problem and return 0 right away. Otherwise we go back to the shard of the second node and add the second part of the relationship chain to the incoming node.

return ids;
                    }).then([rel_type_id, shard_id2, id2, this](std::vector<uint64_t> ids) {
                        // if the relationship is valid (and by extension both nodes) then add the second part
                        if (!ids.empty()) {
                            return container()
                            .invoke_on(shard_id2, [rel_type_id, id1 = ids[0], id2, rel_id = ids[1]](
                                    Shard &local_shard) {
                                return local_shard.RelationshipAddToIncoming(rel_type_id, rel_id, id1, id2);
                            });
                        }
                        return seastar::make_ready_future<uint64_t>(uint64_t(0));
                    });

So to sum it up:

  1. Find and validate the node id of node 2 by type and key on the shard of node 2
  2. Find and validate the node id of node 1 by type and key on the shard of node 1
  3. Add the relationship and the outgoing part of the relationship chain on the shard of node 1
  4. Add the incoming part of the relationship chain to the shard of node 2

If the relationship type did not exist when this method is called, it creates it first on all the shard then follows those steps. If the request had a JSON payload of properties we parse them and add them during the outgoing part of the relationship chain creation.

If that sounded confusing… just wait until we get to the hardest part of the entire codebase. Deleting a node. Because you see you can’t just delete a node, you have to delete all of its relationships as well otherwise you end up with “half edges” and that’s a big no-no in graph databases. I am getting ahead of myself however. Let’s test our functionality by creating two nodes and a relationship between them. First let’s create me:

Now, let’s create Helene:

…and finally let’s create a relationship between them:

…and it worked! If you want to see the code in one spot go to the github repository. If you want to join me on Slack, follow this link and pop in to say hello. I’m further ahead code wise than blog wise, so please do check out the source. I could use help with writing test code, commenting, documentation and just testing it if you are feeling up for it. Until next time.

Tagged , , , ,

Leave a comment