Let’s build something Outrageous – Part 8: Queries with Lua

Jamie Brandon wrote a monster of a blog post the other day against SQL. This is my favorite part:

To take an example close to my heart: Differential dataflow is a dataflow engine that includes support for automatic parallel execution, horizontal scaling and incrementally maintained views. It totals ~16kloc and was mostly written by a single person. Materialize adds support for SQL and various data sources. To date, that has taken ~128kloc (not including dependencies) and I estimate ~15-20 engineer-years. Just converting SQL to the logical plan takes ~27kloc, more than than the entirety of differential dataflow.

Similarly, sqlite looks to have ~212kloc and duckdb ~141kloc. The count for duckdb doesn’t even include the parser that they (sensibly) borrowed from postgres, which at ~47kloc is much larger than the entire ~30kloc codebase for lua.

Jamie Brandon – Against SQL

Yeah. We’re not doing that nonsense. No query language for us thank you very much. Besides, all that work is for nothing. I wrote about it on this blog post, but the gist of it is: If your users do not know the exact details of how the query will be executed by your system, they will write the query in one of 44 different ways and those queries will be translated into 30 different query plans with wildly different performance characteristics.

So instead we’re letting users talk to the graph directly using Lua. How is this going to work you may ask. Well, we first start off with an endpoint: http://localhost:7243/db/rage/lua that takes a POST with a string of Lua code.

We’re going to create a couple of User nodes (Max and Helene) and add a LOVES relationship between them. Now let’s retrieve them by sending NodeGet(“User”, “Max”) in the body of a POST request to /db/rage/lua:

What kind of sorcery is this you may ask? Let’s see how we got there. First we created the http endpoint and all it does is call RunLua on whatever shard got the request and return the string result as json.

future<std::unique_ptr<reply>> Lua::PostLuaHandler::handle(
...
        return parent.graph.shard.invoke_on(this_shard_id(), [body](Shard &local_shard) {
            return local_shard.RunLua(body);
        }).then([rep = std::move(rep)] (const std::string& result) mutable {
...
            rep->write_body("json", sstring(result));
            return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

Alright, so what does RunLua do? It takes our user provided Lua script and splits it into lines:

std::stringstream ss(script);
std::string line;
std::vector<std::string> lines;
while(std::getline(ss,line,'\n')){
    lines.emplace_back(line);
}

Then we add a require json statement to the beginning of the script to load a makeshift json library and replace the last line of their provided script into a return statement that encodes whatever is on that line. We then combine it all back together into a string “executable”.

std::string json_function = "local json = require('json')";
lines.back() = "return json.encode({" + lines.back() + "})";

std::string executable = json_function + join(lines, " ");

So our request becomes:

local json = require('json')
return json.encode({NodeGet("User","Max")})

If the user script had more instructions they would come between the first and last line, but only the last line is returned. I borrowed this idea from how the Lua C API actually works with the “-1” to get the most recently pushed value.

Now that we have our script in the format we want, we can execute it and return a String result from our json encode:

sol::protected_function_result script_result;

this->lua_lock.for_write().lock().get();
try {
    script_result = lua.script(executable, [] (lua_State *, sol::protected_function_result pfr) {
        return pfr;
    });
    this->lua_lock.for_write().unlock();
    if (script_result.valid()) {
        return script_result.get<std::string>();
    }

We only have one Lua VM for each Core, so that is why we lock it during use. I was thinking it may be worth it to create multiple lua states per shard and lock each individually. Something for the TODO list. In the makeshift json library we have this piece of code that says if the type of the object being returned is “userdata” then call its tostring method.

if vtype=='userdata' then
  return tostring(v)
end

We’ve added luajit and connected to it via sol2 as dependencies in our project. We’ve told sol2 about a usertype Node, how to construct one with 3 different constructors and what the methods we are allowing users to call from it.

sol::state lua;
lua.open_libraries(sol::lib::base, sol::lib::package, sol::lib::math, sol::lib::string, sol::lib::table);

lua.new_usertype<Node>("Node", sol::constructors<
      Node(), 
	  Node(uint64_t, std::string, std::string),
      Node(uint64_t, std::string, std::string, std::map<std::string, std::any>)
	>(),
    "getId", &Node::getId,
    "getTypeId", &Node::getTypeId,
	"getType", &Node::getType,
    "getKey", &Node::getKey,
    "getProperties", &Node::getPropertiesLua,
    "getProperty", &Node::getProperty);

One of the neat things we get from Sol2 is “automagic meta functions” specifically:

Support for to_string operations where: 
std::ostream& operator<<( std::ostream&, const T& )
obj.to_string()
to_string( const T& ) exists on the C++ type

So in our C++ Node class we have defined the operator<<, I’ll show part of it here:

std::ostream& operator<<(std::ostream& os, const Node& node) {
        os << "{ \"id\": " << node.id << R"(, "type": ")" << node.type 
           << R"(", "key": )" << "\"" << node.key << "\"" 
           << ", \"properties\": { ";
        bool initial = true;
        for (auto [key, value] : node.properties) {
        ...

The missing part is where it handles properties, take a look at it on the github repository. Ok, how about something more complex than getting a node?

names = {}
ids = NodeGetRelationshipsIds("User", "Max")
for k=1,#ids do
    v = ids[k]
    table.insert(names, NodePropertyGetById(v.node_id, "name"))
end
names

In the Lua script above, we initialize a names “table” to be empty, then we get the ids of the relationships of User Max, and for each of those relationships we insert the name property of the node in the relationship by using its id. Take a look:

Pretty cool right? How about doing a bunch of things and returning them all?

Awesome! No query language, no problem. Until next time!

Tagged , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: