Checking if two nodes are directly connected is something you often have to do in a graph. There are a few different ways to implement this feature depending on how the database keeps track of relationships. In Neo4j a double linked list of relationships is kept per node grouped by the relationship type in both the incoming and outgoing directions. To check if two nodes are directly connected, one has to traverse one of the lists (preferably the shortest one) and checking to see if the other node id is included in that list. If we don’t know the relationship type, we have to check all the groups (for dense nodes, or light nodes there are no groups and we check them all anyway).
In Amazon Neptune the SPOG index can be queried twice. Once with the first node in the S position and the second node in the O position, then again with the positions reversed (with the P position being the relationship type). If we don’t know the relationship type we can query the indexes twice per relationship type.
Checking if two nodes are directly connected is similar to checking for set membership, and one trick we could use is a bloom filter and variant data structures. Long time readers will remember this blog post outlining exactly how to do that and achieve 100x faster checks including a “double check” to get around the probabilistic nature of these data structures.
You may have noticed that homes are selling for ridiculously high prices lately. Some are calling it inflation, others see it as a supply chain squeeze the likes of which we’ve never seen before. I’m in the latter camp and I think things will eventually return to normal levels but the “inflationistas” yearn for a simpler time. A return to the Gold Standard of currency which kept inflation in check. Do you know who else wanted to return to the gold standard?
That’s right, the “bad guy” from the G.I. Joe series wanted to burn all paper currency and use Cobra gold currency. Maybe he wasn’t so bad after all? But he did spend most of his time shouting orders at people. He commanded things to change, much in the same way we command our data to change. There is a great post on StackOverflow on this subject, I’ll reproduce the relevant bits here. If you define:
In the previous blog post we started at 32 and managed to reach 724 requests per second. We also ended it saying that I’m going to investigate a way to vectorize it so we can check multiple node properties in batches instead of doing things one at a time. Well, I found a way and the results are nothing short of outrageous.
In our Find method we were looking at a list of property values and comparing them to a single value one at a time. Computer since the early 90s have been able to perform SIMD (Single Instruction Multiple Data) operations and we’re going to take advantage of that.
At some point in your life you will look at the mirror and not recognize the person staring back at you. It’s not only the way you look, but the choices you made and the results of those choices that will seem puzzling. You may decide right then and there to embark in the most important quest of your life. Finding yourself.
Nodes and Relationships don’t get the luxury of having mirrors, but we still need to find them every once in a while. So far all we are able to do is get a node by a label and key or use the id to get a node or relationship. This is great and covers a lot of use cases, but sometimes the primary id is not what we are looking for. We may be looking for all nodes of a type that share a property value equal to something, or greater/less than something else. How can we go about finding the answers to these types of queries in our graph?
Helene has been testing software since we met in college back in 1999. Today she is the head of QA at S&P Market Intelligence managing hundreds of people. I don’t know how she does it but she is the kind of person that breaks every piece of software she touches… even when she doesn’t want to… like the in-flight entertainment system on a long flight to France. About 15 years ago, I made a terrible mistake and asked her to test a web application I had written. She ripped it apart in minutes. I felt like the worst developer in the world, and I probably wasn’t far off. I never wanted to feel that way again, so I got better at writing tests. Now I’m only ranked second worst, right after that dev who doesn’t test at all.
We decided that our Nodes and Relationships will have Schema in our database. If we create a User Node Type and set an “age” property type as an Integer, then all User nodes will have that property as an Integer. This idea seem simple enough, but what happens if we truly don’t know the value of a property for a node? The user hasn’t told us their age, or for one of many perfectly valid reasons do not know what their age is? Or what if that age property that was once set is deleted? What do we do now?
In Part 5 of this series we talked about how we store properties, but not really how we delete them. In my first design I had “tombstone” (a value indicating this data is no longer here) by type. So for Strings it would be an empty string, for Integers it was the lowest negative value allowed, etc… but what do we do about Booleans? It’s either true or false, it doesn’t lend it self to a tombstone style value… and for the ones that do, does it even make sense?
Jim Morrison died in 1971, I wasn’t even alive then. I didn’t learn about The Doors until the Val Kilmer film from the 90s. I was still too young to really understand them but I became a fan of the music nonetheless. In the last few months I’ve learned more about doors. I learned about this concept of one way and two way doors. The idea being that one way doors can only be broken through once, but if you don’t love the decision you made you can walk through the door two times and be back to where you started.
When I was a new Java developer I would sometimes wake up in the middle of the night hyperventilating and covered in sweat. Usually from a nightmare about Maven and fighting with pom.xml. We dream of Software, but does Software dream? I don’t know, but I hope when Maven goes to sleep at night, it wakes up screaming thinking about CMake and CMakeLists.txt… I know I do.
Rather I should say, “I did”, because I went looking for help on youtube and ran into a template for C++ projects by Jason Turner which made the nightmares stop. We’ll start our new project by blindly copying that into our repository and removing a few GUI related things we won’t be using. Watch the video for all the details, I only understood half of it, but it was enough. There I learned about Conan.
I could not decide if I wanted a Rhinoceros or Cthulhu as a mascot for this new project, so I went with both. The image you see above is what I ended up with. It’s absolutely outrageous and fits perfectly with our theme. That was going to be the name by the way “Outrageous DB” but it was kind of long, so I went with “Rage DB” instead. Right about now you may be thinking, wait, what are we doing, what is this new project? What is going on?