performance | Max De Marzi

Sep 26 2016

Custom Importers

imports

When it comes to getting data into Neo4j, you have a ton of options. You can use LOAD CSV from Cypher, you can use the Import Tool, you can use the JDBC connector in APOC, and possibly a few more options I’m forgetting. Some of these require the data to be in a specific format, others that you write a little custom cypher. These work very well most of the time, but sometimes you run into data in weird shapes and coming in from vendors who aren’t willing to change just for you. What do you do in that case? Well, you write a custom importer. I’m going to show you how by importing the Cities database from MaxMind.
Continue reading →

Tagged graph database, import, maxmind, neo4j, nosql, performance

Feb 20 2016

3 Comments

Cypher, Java, Testing

Speeding up Traversals

A few folks have come to us recently with the need to trace lineages of nodes of variable depth many hops away. You can run into this need if you are looking at the ancestries of living things, tracing data as it flows through an ETL, large network connectivity maps, etc. These types of queries tend to be murder on relational databases because of the massive recursive joins they have to deal with. Let’s give them a try in Neo4j.
Continue reading →

Tagged cypher, graph, graph database, neo4j, performance, technology

Feb 18 2016

4 Comments

Cypher, Java, Problems, Testing

Scaling Cypher Writes

salt-pepa-writes

Let’s talk about writes, baby. Let’s talk about you and me. Let’s talk about all the good things. And the bad things that may be. Let’s talk about writes, and indexing and batching, and transactions in Neo4j. Let’s start with my environment. A 3 year old MacBook Pro (dying to get the new ones… once they finally come out) running a 4 core 2.3 GHz Intel Core i7 that is hyper-threading and pretending to have 8. An Apple SM256E SSD that is about average as far as SSDs go. So definitely not a production grade server, so bear that in mind.
Continue reading →

Tagged cypher, graph database, java, neo4j, performance, technology, testing

Oct 16 2015

1 Comment

Java, Problems, Random, Testing

Benchmarks and Superchargers

For the most part, I hate competitive benchmarks. The vendor who publishes them always seems to come out on top regardless. The numbers are always amazing, but once you start digging in a little bit you start to see faults in what is actually being measured and it never applies to real world workloads. For example you have Cassandra claiming 1 Million writes per second on 300 servers. Then Aerospike claiming 1 Million writes per second on 50 servers. MongoDB claiming almost 32k writes per second on a single server, but claiming Cassandra can only do 6k w/s and Couch can only do 1.2k w/s on a single server… Then ScyllaDB has almost 2 Million writes per second on 3 servers blowing everybody away.
Continue reading →

Tagged cypher, graph, graph database, java, neo4j, performance, testing

Jul 05 2015

3 Comments

Java, Testing

Using the Testing Harness for Neo4j Extensions

harness

I’ve been creating both unit tests and integration tests for Neo4j Unmanaged Extensions for far too long. The Neo4j Testing Harness was introduced in version 2.1.6 to simplify our lives and just do integration tests. Let’s try it on and see just how awesome we look. First thing we need to do is add the dependency to our project:
Continue reading →

Tagged github, graph, graph database, java, neo4j, nosql, performance, testing

Mar 17 2015

6 Comments

Cypher, Java, Problems

One Direction Relationships in Neo4j

In the Neo4j Property Graph model, every single Relationship must be Typed and Directed. This means they must have a specific name (FRIENDS, LIKES, FOLLOWS, etc) and have a Start Node and an End Node to show direction. What’s neat is that when you write your queries you can choose to ignore that. The following queries are all valid:

// Get all the people I follow 
MATCH (u1:Person)-[:FOLLOWS]->(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people that I follow or follow me
MATCH (u1:Person)-[:FOLLOWS]-(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

// Get all the people related to me 
MATCH (u1:Person)--(u2:Person)
WHERE u1.username = "maxdemarzi"
RETURN u2.username

Continue reading →

Tagged graph database, java, neo4j, performance, software, technology

Mar 08 2015

Giving Neo4j 2.2 a Workout

Neo4j 2.2 is getting released any day now, so let’s put the Release Candidate through its paces with Gatling. Once we download and start it up, you’ll notice it wants us to authenticate.
Continue reading →

Tagged cypher, gatling, graph database, neo4j, network, performance, software, technology, testing

Feb 27 2015

Caching Immutable Id lookups in Neo4j

If you’ve been following my blog for a while, you probably know I like using YourKit and Gatling for testing end to end requests in Neo4j. Today however we are going to do something a little different. We are going to be micro-benchmarking a very small piece of code within our Unmanaged Extension using a Java library called JMH.

Continue reading →

Tagged cache, graph database, neo4j, performance, testing

Feb 24 2015

Remote Profiling Neo4j with YourKit on AWS

A few months ago, Mark Needham blogged about how to setup remote monitoring of Neo4j using YourKit. I was asked the other day about getting a few more details on how to do this on Amazon, so here is my attempt at that. The first thing we’ll do is setup Neo4j on a Virtual Private Cloud. It’s good practice to not put your databases directly on the public internet.
Continue reading →

Tagged graph database, neo4j, performance, profiling

Jul 01 2014

5 Comments

Java, Problems

Scaling Concurrent Writes in Neo4j

A while ago, I showed you a way to scale Neo4j writes using RabbitMQ. Which was kinda cool, but some of you asked me for a different solution that didn’t involve adding yet another software component to the stack.

Turns out we can do this in just Neo4j using a little help from the Guava library. The solution involved a background service running that holds the writes in a queue, and every once in a while (like say every second) commits those writes in one transaction.
Continue reading →

Tagged github, graph, graph database, graph databases, java, neo4j, performance, relationship graph, testing

Max De Marzi

Graphs, Graphs, and nothing but the Graphs

Tag Archives: performance