Tag Archives: performance

Custom Importers

imports

When it comes to getting data into Neo4j, you have a ton of options. You can use LOAD CSV from Cypher, you can use the Import Tool, you can use the JDBC connector in APOC, and possibly a few more options I’m forgetting. Some of these require the data to be in a specific format, others that you write a little custom cypher. These work very well most of the time, but sometimes you run into data in weird shapes and coming in from vendors who aren’t willing to change just for you. What do you do in that case? Well, you write a custom importer. I’m going to show you how by importing the Cities database from MaxMind.
Continue reading

Tagged , , , , ,

Speeding up Traversals

roots

A few folks have come to us recently with the need to trace lineages of nodes of variable depth many hops away. You can run into this need if you are looking at the ancestries of living things, tracing data as it flows through an ETL, large network connectivity maps, etc. These types of queries tend to be murder on relational databases because of the massive recursive joins they have to deal with. Let’s give them a try in Neo4j.
Continue reading

Tagged , , , , ,

Scaling Cypher Writes

salt-pepa-writes

Let’s talk about writes, baby. Let’s talk about you and me. Let’s talk about all the good things. And the bad things that may be. Let’s talk about writes, and indexing and batching, and transactions in Neo4j. Let’s start with my environment. A 3 year old MacBook Pro (dying to get the new ones… once they finally come out) running a 4 core 2.3 GHz Intel Core i7 that is hyper-threading and pretending to have 8. An Apple SM256E SSD that is about average as far as SSDs go. So definitely not a production grade server, so bear that in mind.
Continue reading

Tagged , , , , , ,

Benchmarks and Superchargers

Interceptor

For the most part, I hate competitive benchmarks. The vendor who publishes them always seems to come out on top regardless. The numbers are always amazing, but once you start digging in a little bit you start to see faults in what is actually being measured and it never applies to real world workloads. For example you have Cassandra claiming 1 Million writes per second on 300 servers. Then Aerospike claiming 1 Million writes per second on 50 servers. MongoDB claiming almost 32k writes per second on a single server, but claiming Cassandra can only do 6k w/s and Couch can only do 1.2k w/s on a single server… Then ScyllaDB has almost 2 Million writes per second on 3 servers blowing everybody away.
Continue reading

Tagged , , , , , ,

Using the Testing Harness for Neo4j Extensions

harness

I’ve been creating both unit tests and integration tests for Neo4j Unmanaged Extensions for far too long. The Neo4j Testing Harness was introduced in version 2.1.6 to simplify our lives and just do integration tests. Let’s try it on and see just how awesome we look. First thing we need to do is add the dependency to our project:
Continue reading

Tagged , , , , , , ,

One Direction Relationships in Neo4j

onedirectionchop

In the Neo4j Property Graph model, every single Relationship must be Typed and Directed. This means they must have a specific name (FRIENDS, LIKES, FOLLOWS, etc) and have a Start Node and an End Node to show direction. What’s neat is that when you write your queries you can choose to ignore that. The following queries are all valid:

// Get all the people I follow 
MATCH (u1:Person)-[:FOLLOWS]->(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people that I follow or follow me
MATCH (u1:Person)-[:FOLLOWS]-(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people related to me 
MATCH (u1:Person)--(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

Continue reading

Tagged , , , , ,

Giving Neo4j 2.2 a Workout

rhino_running

Neo4j 2.2 is getting released any day now, so let’s put the Release Candidate through its paces with Gatling. Once we download and start it up, you’ll notice it wants us to authenticate.
Continue reading

Tagged , , , , , , , ,

Caching Immutable Id lookups in Neo4j

GiveMeTheCache

If you’ve been following my blog for a while, you probably know I like using YourKit and Gatling for testing end to end requests in Neo4j. Today however we are going to do something a little different. We are going to be micro-benchmarking a very small piece of code within our Unmanaged Extension using a Java library called JMH.

Continue reading

Tagged , , , ,

Remote Profiling Neo4j with YourKit on AWS

remote_profile_sideways

A few months ago, Mark Needham blogged about how to setup remote monitoring of Neo4j using YourKit. I was asked the other day about getting a few more details on how to do this on Amazon, so here is my attempt at that. The first thing we’ll do is setup Neo4j on a Virtual Private Cloud. It’s good practice to not put your databases directly on the public internet.
Continue reading

Tagged , , ,

Scaling Concurrent Writes in Neo4j

concurrent writes

A while ago, I showed you a way to scale Neo4j writes using RabbitMQ. Which was kinda cool, but some of you asked me for a different solution that didn’t involve adding yet another software component to the stack.

Turns out we can do this in just Neo4j using a little help from the Guava library. The solution involved a background service running that holds the writes in a queue, and every once in a while (like say every second) commits those writes in one transaction.
Continue reading

Tagged , , , , , , , ,