Tag Archives: performance

Finding your neighbors using Neo4j

In Mr. Rogers’ Neighborhood, the question “Won’t you be my neighbor?” is an invitation for somebody to be close to you. In graphs, it’s an invitation to traverse. The closest neighbors of a node are those reachable by a single relationship hop, but we can also consider nodes two, three or more hops away our neighbors as well. How can we find them in Neo4j? Using the “star”:
Continue reading

Tagged , , , , , , , , , ,

Multiple origin multiple destination 3 relationships queries for knowledge graphs using Neo4j

The multiple-origin-multiple-destination (MOMD) problem is an NP-Hard problem sometimes seen in logistics planning where paths can stretch out really far. A far simpler problem presents itself when we limit the size of the paths. Now you may be wondering, why would we do that? Well… outside logistics we have plenty of graphs where relevance drops as we get further and further away. Think about an Article on Wikipedia. It has links to many other articles that are relevant, and those have links to other articles that are relevant to them but less relevant to our starting Article, and those have links to other articles that may be relevant to them, but have very little to do with our starting Article. I think if we keep going we end up in Philosophy or something like that.
Continue reading

Tagged , , , , , , , , , ,

Building a Dating site with Neo4j – Part Seven

Now it is time to create the timeline for our users. Most of the time, the user wants to see posts from people they could High Five in order to elicit a conversation. Sometimes, they want to see what their competition is doing and what kind of posts are getting responses… also who they can low five. I don’t think they don’t want to see messages from people who are not like them and don’t want to date them but I could be wrong.
Continue reading

Tagged , , , , , , , , , ,

Building a Dating site with Neo4j – Part One

You might have already heard that Facebook is getting into the Dating business. Other dating sites have been using graphs in the past and we’ve looked at finding love using the graph before. It has been a while though, so let’s return to the topic making use of the new Date and Geospatial capabilities of Neo4j 3.4. I have to warn you though that I’ve been with Helene for almost 15 years and missed out on all this dating site fun, what I do know I blame Colin for it and some pointers from the comments section of this blog post.
Continue reading

Tagged , , , , , , , , , ,

Adding gRPC to Neo4j

You are probably sick of me saying it, but one of the things I love about Neo4j is that you can customize it any way you want. Extensions, stored procedures, plugins, custom indexes, custom apis, etc. If you want to do it, then you can do it with Neo4j.

So the other day I was like what about this gRPC thing? Many companies standardize their backend using RESTful APIs, others are trying out GraphQL, and some are using gRPC. Neo4j doesn’t support gRPC out of the box, partially because we have our own custom binary protocol “Bolt”, but we can add a rudimentary version of gRPC support quite easily.
Continue reading

Tagged , , , , , , , ,

Stored Procedure to Import Data

A while back I showed you how to write an extension to import the MaxMind city data set. Today is just a repeat of that exercise but instead of using an extension, we will use a stored procedure.

The documentation spells out how to write your own procedures in Chapter 6 so I’m not going to go over that again, but I do want to point out a few things.
Continue reading

Tagged , , , , , , , , ,

Mutual Fund Benchmarks with Neo4j

Just the other day I had a conversation with an Investment Risk Manager about one of the data problems his team was working on and he was wondering if Neo4j could help. Imagine you have about 20,000 mutual funds and etfs and you want to track how they measure up against a benchmark like say the returns of the S&P 500. I’m sorry did I say one? I meant all of them, let’s say 2,000 different benchmarks… and you want to track it every day, for a rolling 5 years period. So that’s 20,000 securities * 2000 benchmarks * 5 years * 252 trading days a year (on average)… or 50 billion data points. That’s a BIG join table if we were using a relational database. How can we efficiently model this in Neo4j?
Continue reading

Tagged , , , , , , , ,

Counting Nodes with Multiple Labels

We have over 6000 users in our #neo4j-users slack channel and get all kinds of requests. About a month ago Thomas Shields asked:

Should counting the set of things with 2 labels really take so long? I’ve got 48M nodes with LabelA and LabelB and the query `MATCH (n:LabelA:LabelB) RETURN COUNT(n)` is taking 80-90 seconds

Let’s see what’s going on by creating a small version of his graph. We will create 1M nodes of LabelA, then 1M nodes with both LabelA and LabelB, and then 1M nodes with just Label B:
Continue reading

Tagged , , , , , ,

Building a Boolean Logic Rules Engine in Neo4j

A boolean logic rules engine was the first project I did for Neo4j before I joined the company some 5 years ago. I was working for some start-up at the time but took a week off to play consultant. I had never built a rules engine before, but as far as I know ignorance has never stopped anyone from trying. Neo4j shipped me to the client site, and put me in a room with a projector and a white board where I live coded with an audience of developers staring at me, analyzing every keystroke and cringing at every typo and failed unit test. I forgot what sleep was, but managed to figure it out and I lost all sense of fear after that experience.

The data model chained together fact nodes with criss crossing relationships each chain containing the same path id property we followed until reaching an end node which triggered a rule. There were a few complications along the way and more complexity near the end for ordering and partial matches. The traversal ended up being some 40 lines of the craziest Gremlin code I ever wrote, but it worked. After the proof of concept, the project was rewritten using the Neo4j Java API because at the time only a handful of people could look at a 40 line Gremlin script and not shudder in horror. I think we’re up to two handfuls now.
Continue reading

Tagged , , , , , , , , , ,

Finding Triplets with Neo4j

A user had an interesting Neo4j question on Stack Overflow the other day:

I have two types of nodes in my graph. One type is Testplan and the other is Tag. Testplans are tagged to Tags. I want most common pairs of Tags that share the same Testplans with a Tag having a specific name. I have been able to achieve the most common Tags sharing the same Testplan with one Tag, but getting confused when trying to do it for pairs of Tags.

Continue reading

Tagged , , , , , , , ,