Let’s build something Outrageous – Part 19: LDBC Short Queries

The folks who build the database are not the same folks who use the database and that causes problems. It has been my number 1 complaint for the past decade or so. People building features in isolation can’t see the forest for the trees and the end user experience suffers. I ran into this video from Molham Aref where he puts it quite nicely:

You are making decisions you don’t suffer from. That’s the line I need to be said at every stand up, at every product manager meeting, so the people in the room would at least think about having some empathy for those of us in the field who do suffer from their decisions. Alright enough of that. I’m putting my big boy pants on now and writing some queries.

In the last blog post we imported the LDBC Social Network dataset. After working with it for a bit I decided to make a minor change. I added the “id” property as an actual property of the node, not just the key. Seems a bit redundant but it made writing the queries easier. Take a look at the LDBC Reference if you want to follow along, but we’re jumping to the middle and starting with the easy seven “Short Queries”. Naturally you’d think we’d start with the first one… but instead we’re going to go from easiest to hardest to write. So let’s start with query 5:

This query is pretty straight forward. Given a message id, traverse the “has creator” relationship and get the person who wrote it. We have a method called NodeGetNeighbors which lets us do exactly what it sounds like, so we’ll make use of it. It returns a list of neighbors, but we know a message only has a single creator, so we just need to get the first and only neighbor. In Lua the lists start at index 1 (position 0 is reserved for the length of the list). Once have our person node, we grab some of their properties and return a result. Everything below the result dashed line is the actual output from the database:

local person = NodeGetNeighborsForDirectionForType("Message", "4947802324992", 
                                                   Direction.OUT, "HAS_CREATOR")[1]
local result = {
  ["person.id"] = person:getProperty("id"),
  ["person.firstName"] = person:getProperty("firstName"),
  ["person.lastName"] = person:getProperty("lastName")
}

result
------
[{"person.firstName":"Mahinda", "person.lastName":"Perera", "person.id":933}]

You may be wondering why we’re using “local” in front of everything, that’s just so Lua keeps the variable nearby in an array and not in a global hash. It helps the performance of some queries but doesn’t do much here. At least that’s what I’ve been reading, it may be a bit of Cargo Cult Programming on my part. That first query wasn’t so bad. On to the next one. This one wants us to just retrieve a node and one property or another. It should have been the easiest one to write but I want to use an “If Statement” here. You know, that thing that Cypher doesn’t have.

local properties = NodeGetProperties("Message", "4947802324992")
local result = {
  ["message.creationDate"] = date(properties["creationDate"]):fmt("${iso}Z")
}

if (properties["content"] == '') then
  result["message.imageFile"] =  properties["imageFile"]
else
  result["message.content"] = properties["content"]
end 

result
------
[{"message.content":"About Rupert Murdoch, t newer electronic publishing technoAbout George Frideric Handel, concertos. Handel was born in 1685,About Kurt Vonne", "message.creationDate":"2011-08-17T06:05:40.595+0000Z"}]
or 
[{"message.imageFile":"photo1649267441795.jpg", "message.creationDate":"2010-08-19T02:17:46.627+0000Z"}]

Here we get the properties of a node, format the creationDate property the way the LDBC spec wants us to format it and then have an if statement. If the content of the message is empty, we include the imageFile property, if it isn’t, we include the text content. So far so good. Let’s keep going.

Another one that is pretty straight forward. Given a person id, get the properties of that person along with the id of the City where they are located in. So we do just that:

local properties = NodeGetProperties("Person", "933")
local city = NodeGetNeighborsForDirectionForType("Person", "933", 
                                                 Direction.OUT, "IS_LOCATED_IN")[1]
local result = {
  ["person.firstName"] = properties["firstName"],
  ["person.lastName"] = properties["lastName"],
  ["person.birthday"] = properties["birthday"],
  ["person.locationIP"] = properties["locationIP"],
  ["person.browserUsed"] = properties["browserUsed"],
  ["city.id"] = city:getProperty("id"),
  ["person.gender"] = properties["gender"],
  ["person.creationDate"] = date(properties["creationDate"]):fmt("${iso}Z")   
}

result
------
[{"person.lastName":"Perera", "person.browserUsed":"Firefox", "person.birthday":"1989-12-03", "person.creationDate":"2010-02-14T15:32:10.447+0000Z", "person.firstName":"Mahinda", "person.gender":"male", "city.id":1353, "person.locationIP":"119.235.7.103"}]

Query 3 looks pretty simple, except for the sorting at the end. They want us to sort on the creation date of the relationship in descending order and the person id in ascending order if the dates are the same. That is going to make things a bit ugly. We start off by getting the Links of the Person for both directions in the KNOWS relationship type. Then we get just the date from the relationship and the properties from the node. The query would stop here, but then we have to sort…and at the end we have to format the date to the spec.

local knows = NodeGetLinksForType("Person", "17592186055119", "KNOWS")
local friendships = {}
for i, know in pairs(knows) do
  creation = RelationshipGetProperty(know:getRelationshipId(),"creationDate")
  friend = NodeGetPropertiesById(know:getNodeId())
  friendship = {
    ["friend.id"] = friend["id"],
    ["friend.firstName"] = friend["firstName"],
    ["friend.lastName"] = friend["lastName"],   
    ["knows.creationDate"] = creation
  }
  table.insert(friendships, friendship)
end

table.sort(friendships, function(a, b) 
  if a["knows.creationDate"] > b["knows.creationDate"] then
      return true
  end
  if (a["knows.creationDate"] == b["knows.creationDate"]) then 
     return (a["friend.id"] < b["friend.id"] )  
  end
end)

for i = 1, #friendships do
  friendships[i]["knows.creationDate"] = date(friendships[i]["knows.creationDate"]):fmt("${iso}Z") 
end

friendships
-----------
[[{"knows.creationDate":"2012-09-11T20:30:14.559+0000Z", "friend.firstName":"Jason", "friend.lastName":"Jones", "friend.id":32985348840899}, 
{"knows.creationDate":"2012-09-04T02:22:54.999+0000Z", "friend.firstName":"George", "friend.lastName":"Smith", "friend.id":32985348887665},
...

Query 6 has a “star” query. That means “keep going”. We are going to drop into a forum post message thread somewhere along the reply chain and have to traverse UP to the Post, then make our way to the Forum and then the moderator. We should really build a Traversal API here.. but a while loop will do the trick for us. Once we have our message, we get the CONTAINER_OF relationships. If it’s a Post, it will have one, so we can skip the while loop and go the Forum, Moderator, etc. However if it doesn’t have any of those Links, then we go up the REPLY_OF relationship to the next Message in the chain and get the CONTAINER_OF links of that node and restart our while loop. Neat right? That’s not so hard to write when you have a programming language.

local node_id = NodeGetId("Message", "8246337208331")
local links = NodeGetLinksByIdForDirectionForType(node_id, Direction.IN, "CONTAINER_OF")
while (#links == 0) do
    links = NodeGetLinksByIdForDirectionForType(node_id, Direction.OUT, "REPLY_OF")
    node_id = links[1]:getNodeId()
    links = NodeGetLinksByIdForDirectionForType(node_id , Direction.IN, "CONTAINER_OF")  
end
node_id = links[1]:getNodeId()
local forum = NodeGetById(node_id)
local moderator = NodeGetNeighborsByIdForDirectionForType(node_id, 
                                                          Direction.OUT, "HAS_MODERATOR")[1]
local properties = moderator:getProperties()
local result = {
  ["forum.id"] = forum:getProperty("id"),
  ["forum.title"] = forum:getProperty("title"),
  ["moderator.id"] = properties["id"],
  ["moderator.firstName"] = properties["firstName"],
  ["moderator.lastName"] = properties["lastName"]
}

result
------
[{"moderator.lastName":"Perera", "forum.id":0, "forum.title":"Wall of Mahinda Perera", "moderator.firstName":"Mahinda", "moderator.id":933}]

Alright query 7 starts to get more complicated. We get a message and find out who wrote it. Then we find out the people this message author knows. We save their node ids in a table which we will check later. Next we get the comments that replied to our original message, their authors and with their node ids we check the table from earlier to see if the original author knows them… Then we have to sort and format the dates.

local message_node_id = NodeGetId("Message", "8246337208329")
local author = NodeGetNeighborsByIdForDirectionForType(message_node_id, 
                                                       Direction.OUT, "HAS_CREATOR")[1]
local knows = NodeGetLinksByIdForType(author:getId(), "KNOWS")
local knows_ids = {}
for i, know in pairs (knows) do
  table.insert(knows_ids, know:getNodeId())
end

local comments = {}
local replies = NodeGetNeighborsByIdForDirectionForType(message_node_id, 
                                                        Direction.IN, "REPLY_OF")
for i, reply in pairs (replies) do
  local replyAuthor = NodeGetNeighborsByIdForDirectionForType(reply:getId(),
                                                              Direction.OUT, "HAS_CREATOR")[1]
  local properties = replyAuthor:getProperties()
  local comment = {
    ["replyAuthor.id"] = properties["id"],
    ["replyAuthor.firstName"] = properties["firstName"],
    ["replyAuthor.lastName"] = properties["lastName"],
    ["knows"] = not knows_ids[replyAuthor:getId()] == nil,
    ["comment.id"] = reply:getProperty("id"),
    ["comment.content"] = reply:getProperty("content"),
    ["comment.creationDate"] = reply:getProperties()["creationDate"]
  }
  table.insert(comments, comment)
end

table.sort(comments, function(a, b) 
  if a["comment.creationDate"] > b["comment.creationDate"] then
      return true
  end
  if (a["comment.creationDate"] == b["comment.creationDate"]) then 
     return (a["replyAuthor.id"] < b["replyAuthor.id"] )  
  end
end)

for i = 1, #comments do
  comments[i]["comment.creationDate"] = date(comments[i]["comment.creationDate"]):fmt("${iso}Z") 
end

comments
----------
[[{"comment.creationDate":"2012-07-21T03:07:22.319+0000Z", "comment.content":"no", "replyAuthor.id":6597069816195, "replyAuthor.firstName":"Hermann", "comment.id":8246337208331, "replyAuthor.lastName":"Schmidt", "knows":false},
 {"comment.creationDate":"2012-07-21T01:39:21.227+0000Z", "comment.content":"About John Howard, died in combaAbout Isabella I of Castile, r right to thAbo", "replyAuthor.id":36226, "replyAuthor.firstName":"Wolfgang", "comment.id":8246337208332, "replyAuthor.lastName":"Richter", "knows":false}]]

Last and certainly not least is query 2. This beauty has a sort and limit right at the start. We’ve seen sort a bunch of times already but limit is done with table.move and a range. In this case the first 10. Then to make things fun we have another star query to chase the forum post message chain up to the Post and get the original author of the thread… oh and here comes the content or imageFile check too. Lovely.

local person = NodeGet("Person", "21990232564424")
local messages = NodeGetNeighborsByIdForDirectionForType(person:getId(), 
                                                         Direction.IN, "HAS_CREATOR")
table.sort(messages, function(a, b) 
     if a:getProperty("creationDate") > b:getProperty("creationDate") then 
         return true
     elseif a:getProperty("creationDate") == b:getProperty("creationDate") then
        return a:getProperty("id") > b:getProperty("id")
    end
    end)
local smaller = table.move(messages, 1, 10, 1, {})

results = {}
for i, message in pairs(smaller) do
  local properties = message:getProperties()

  local result = {
    ["message.id"] = properties["id"],
    ["message.creationDate"] = date(properties["creationDate"]):fmt("${iso}Z")
  }

  if (properties["content"] == '') then
    result["message.imageFile"] =  properties["imageFile"]
  else
    result["message.content"] = properties["content"]
  end 

  if (properties["type"] == "post") then
      result["post.id"] = properties["id"]
      result["originalPoster.id"] = person:getProperty("id")
      result["originalPoster.firstName"] = person:getProperty("firstName")
      result["originalPoster.lastName"] = person:getProperty("lastName")
  else
    local node_id = message:getId()
    local hasReply = NodeGetLinksByIdForDirectionForType(node_id, Direction.OUT, "REPLY_OF")  
    while (#hasReply > 0) do
      node_id = hasReply[1]:getNodeId()
      hasReply = NodeGetLinksByIdForDirectionForType(node_id, Direction.OUT, "REPLY_OF")  
    end
    local poster = NodeGetNeighborsByIdForDirectionForType(node_id, 
                                                           Direction.OUT, "HAS_CREATOR")[1] 
    local post_id = NodeGetPropertyById(node_id, "id")
      result["post.id"] = post_id
      result["originalPoster.id"] = poster:getProperty("id")
      result["originalPoster.firstName"] = poster:getProperty("firstName")
      result["originalPoster.lastName"] = poster:getProperty("lastName")
  end
    table.insert(results, result)
end

results 
-------
[[{"post.id":8796104790667, "originalPoster.lastName":"Irani", "message.creationDate":"2012-09-13T08:33:24.617+0000Z", "originalPoster.firstName":"Shweta", "message.content":"thanks", "message.id":8796104790670, "originalPoster.id":26388279120306}, 
{"post.id":8796104790624, "originalPoster.lastName":"Irani", "message.creationDate":"2012-09-13T07:51:16.831+0000Z", "originalPoster.firstName":"Shweta", "message.content":"About Robert Bourassa, from May 12, 1970, to NovembeAbout Sammy Cahn, ilms and Broadwa", "message.id":8796104790630, "originalPoster.id":26388279120306}, 
{"post.id":8796104790737, "originalPoster.lastName":"Irani", "message.creationDate":"2012-09-13T05:41:39.012+0000Z", "originalPoster.firstName":"Shweta", "message.content":"About Napoleon, from 1803 tAbout Hermann Göring, ommander ofAbout Michael Schumacher", "message.id":8796104790738, "originalPoster.id":26388279120306},
...

Anyway. There you have it folks. The LDBC Social Network “short” queries 1-7 in RageDB. Some of these were harder to write than others. I feel the pain of some of my decisions here. I really should add Traversal methods, maybe a coalesce. Maybe make it easier to sort. Feeling the pain of working with the database allows me as the person also working on the database to make it better.

If you like to suffer, come help me out. I have a ton more of these LDBC queries to write!

Tagged , , , , ,

Leave a comment