Watch your Language

I was watching The Marvelous Mrs. Maisel and one of the more jarring issues of the first two episodes is that Midge keeps getting arrested for the things she says. It reminded me of the song “Me So Horny” from 2 Live Crew that landed the hip hop group in jail charged with obscenity. Record store owners were getting arrested for selling CDs to undercover cops. How insane does that all sound? But it strikes the point that language matters. The things we say and how we say them are powerful. They convey meaning and emotion, language can be pleasant or it can be foul.

Like many people, I didn’t really enjoy programming until I found a language and framework that resonated with me. I’m speaking of Ruby on Rails, which was optimized for “programmer happiness“. Your own experience will vary and that’s ok. There are hundreds if not thousands of computer languages out there to choose from. Each offering something special to their audience. But why is it that when it comes to query languages, 98% or so use SQL?

That language with all its dialects is bolted on almost all Relational Databases… which oddly enough are not actually Relational ( no nulls and no duplicates ). I actually learned this on my job at RelationalAI, which is a real relational database that doesn’t allow nulls or duplicates. But I digress. There was a short period of time during the NoSQL movement where new query languages had a window to enter the field and fight for a different way of interfacing with data. But then someone put “SQL on Hadoop” and slammed that window shut. What could have been…

Anyway, RageDB needs a language to talk to graphs. I started out with the horrifically long “NodeGetRelationshipsByIdForDirectionForTypes” method names and thankfully found a way to overload the method to the much simpler “NodeGetRelationships”. I thought I could overload my way into a simpler language but then I ran into a problem with bulk methods. Methods like “NodesGet” should be able to get a list of numbers or a list of links and retrieve the nodes:

    lua.set_function("NodesGet", sol::overload(
        [this](std::vector<uint64_t> ids) { return this->NodesGetViaLua(ids); },
        [this](std::vector<Link> links) { return this->NodesGetByLinksViaLua(links); }
       ));

But unfortunately this doesn’t work since sol2 does not currently support overloading vectors. There is a work around… I can inspect the first element in the list and figure out what it is and then call the appropriate method. But the thought of having to do this for dozens of methods just didn’t sit well with me. So, I’m going to do the unthinkable and pay the sol2 maintainers to add this feature. It’s actually not that hard (paying I mean). You just open up your wallet and pay for open source work, you should all try it. If everyone did this, the world would be a better place.

Something else I learned at RelationalAI is how so much nicer it is to build queries like Lego building blocks. I wrote a 3 post series about this, it is amazing… no more 100 line sql queries to hold in your head. So that got me thinking. RageDB uses Lua, a full on scripting language, surely I can do something along those lines. I haven’t thought this through all the way, so I did the simplest (and probably dumbest) thing I could think of. Here is what I did:

I added a method that would check to see if the first 5 characters passed in the Lua body are the comment “–ALL”:

bool forAll(std::string const &str) {
    if (str.length() < 5) {
        return false;
    }
    return str.compare(0,5,"--ALL") == 0;
}

If forAll comes back false, I execute the Lua code as normal, but if it comes back true, then I hijack the Lua body and pass it to all the shards in the database. This makes sure all the Lua VMs run this code and just return a dummy string. I check the executed results from one of the shards and either send back the exception or “Sent to All”:

        if(forAll(body)) {
            body.append("\n\"Sent to All\"");

            return parent.graph.shard.map([body](ragedb::Shard &local_shard) {
                return local_shard.RunAdminLua(body);
            }).then([] (std::vector<std::string> results) {
                  return results[0];
            }).then([rep = std::move(rep)] (const std::string& result) mutable {
                  if(result.rfind(EXCEPTION,0) == 0) {
                      rep->write_body("html", seastar::sstring(result));
                      rep->set_status(seastar::reply::status_type::bad_request);
                      return seastar::make_ready_future<std::unique_ptr<seastar::httpd::reply>>(std::move(rep));
                  }

                  rep->write_body("json", seastar::sstring(result));
                  return seastar::make_ready_future<std::unique_ptr<seastar::httpd::reply>>(std::move(rep));
              });
        }

Since each Shard only has one Lua VM at the moment, this works. If eventually multiple Lua VMs are allowed per shard, I’ll have to make sure to update this code. So what do we win by doing this? For one, we get “stored procedures”. The LDBC Social Network Short Query can be sent once as:

--ALL
ldbc_snb_is01 = function(person_id)
    local properties = NodeGetProperties("Person", person_id)
    local city = NodeGetNeighbors("Person", person_id, Direction.OUT, "IS_LOCATED_IN")[1]
    local result = {
        ["person.firstName"] = properties["firstName"],
        ["person.lastName"] = properties["lastName"],
        ["person.birthday"] = properties["birthday"],
        ["person.locationIP"] = properties["locationIP"],
        ["person.browserUsed"] = properties["browserUsed"],
        ["city.id"] = city:getProperty("id"),
        ["person.gender"] = properties["gender"],
        ["person.creationDate"] = date(properties["creationDate"]):fmt("${iso}Z")
    }

    return result
end

Then forever after we can just send ldbc_snb_is01("1234") to run it. It means we have less code to send and parse but more important, the defined function can now be part of the hot code and be compiled after it is run a few thousand times like you would normally do with a database query.

We can also find places where it makes sense to capture some repeated part of a query into a global function. For example getting the city where a person lives. We can send in this chunk to all the cores:

--ALL
get_city = function(person_id) 
    return NodeGetNeighbors("Person", person_id, Direction.OUT, "IS_LOCATED_IN")[1]
end

Then in our queries we can just reuse it:

--ALL
ldbc_snb_is01 = function(person_id)
    local properties = NodeGetProperties("Person", person_id)
    local city = get_city(person_id)
    ...

Anyway. This new functionality doesn’t solve all my language woes but I think it is a positive move forward in the usability of RageDB. If you have any good recommendations, or want to help out. Please get in touch.

Tagged , , , , , ,

Leave a comment