Random | Max De Marzi

Dec 31 2016

OUR OWN MULTI-MODEL DATABASE – PART 2

shitty2

If you haven’t read part 1 then do that first or this won’t make sense, well nothing makes sense but this specially won’t.

So before going much further I decided to benchmark our new database and found that our addNode speed is phenomenal, but it was taking forever to create relationships. See some JMH benchmarks below:

Benchmark                                                           Mode  Cnt     Score     Error  Units
ChronicleGraphBenchmark.measureCreateEmptyNodes                    thrpt   10  1548.235 ± 556.615  ops/s
ChronicleGraphBenchmark.measureCreateEmptyNodesAndRelationships    thrpt   10     0.165 ±   0.007  ops/s

Each time I was creating 1000 users, so this test shows us we can create over a million empty nodes in one second. Yeah ChronicleMap is damn fast. But then when I tried to create 100 relationships for each user (100,000 total) it was taking forever (about 6 seconds). So I opened up YourKit and you won’t believe what I found out next (come on that’s some good clickbait).
Continue reading →

Tagged github, graph, graph database, performance, relationship graph, software, technology, testing

Dec 30 2016

3 Comments

Java, Random, Testing

Our own Multi-Model Database – Part 1

I may be remembering this wrong, but I think it was Henry Rollins who once asked, “What came first, the shitty Multi-Model Databases or the Drugs?” His confusion was over whether:

A) there were a bunch of developers dicking around with their Mac laptops and they wrote a shitty database, put it on github, posted on hacker news, and then other developers who were on drugs started using it or…

B) there were a bunch of developers on ketamine and ecstasy and somebody said lets write a shitty database

I think “A” is what probably happens and how we end up with over 300 databases on DB Engines. But what about “B” ? Well I don’t have any good stuff lying around, but I did hurt my foot the other day and the doctors gave me some Tramadol, so lets down some of that and see what happens.
Continue reading →

Tagged database, databases, graph database, multi-model, nosql, performance, relationship graph, technology, testing

Sep 28 2016

2 Comments

Java, Problems, Random

Delivering a Graph Based Search solution to slightly wrong data

When it comes to databases, having good clean data is always important. More so with Graphs which deal with concepts as nodes and their relationships between them. Inevitably, you will run into messy data and have to deal with it. In a lot of the projects our customers work on they are dealing with connecting multiple data sources to get to a “golden record” or single source of truth. A lofty goal, sometimes impossible to achieve, but we can use the relationships of the data to help us come close.

One option is to extract the features (or tags) of a composite object and see if any other object shares most of these features. If that is the case then they are possibly the same object and should be merged instead of creating a new record. A partial subgraph match is something akin to a recommendation engine in Neo4j and pretty trivial to write. Take a look back at a few old blog posts for ideas.
Continue reading →

Tagged big data, graph database, graph search, java, neo4j, network, performance, relationship graph, search, software, technology

Oct 16 2015

1 Comment

Java, Problems, Random, Testing

Benchmarks and Superchargers

For the most part, I hate competitive benchmarks. The vendor who publishes them always seems to come out on top regardless. The numbers are always amazing, but once you start digging in a little bit you start to see faults in what is actually being measured and it never applies to real world workloads. For example you have Cassandra claiming 1 Million writes per second on 300 servers. Then Aerospike claiming 1 Million writes per second on 50 servers. MongoDB claiming almost 32k writes per second on a single server, but claiming Cassandra can only do 6k w/s and Couch can only do 1.2k w/s on a single server… Then ScyllaDB has almost 2 Million writes per second on 3 servers blowing everybody away.
Continue reading →

Tagged cypher, graph, graph database, java, neo4j, performance, testing

Aug 26 2015

14 Comments

Problems, Random

Modeling Airline Flights in Neo4j

Actor Leonardo DiCaprio as Frank Abagnale in the Steven Spielberg movie “Catch Me If You Can”

If you’ve come to any of the Neo4j Data Modeling classes I’ve taught, you’ve must have heard me say “your model depends on both your data and your queries” about a million times. Let us take a closer dive into what this means by looking at how one might model airline flight data in Neo4j.
Continue reading →

Tagged graph, graph database, modeling, models, neo4j

Apr 14 2015

2 Comments

Cypher, Problems, Random

Importing the Hacker News Interest Graph

Graphs are everywhere. Think about the computer networks that allow you to read this sentence, the road or train networks that get you to work, the social network that surrounds you and the interest graph that holds your attention. Everywhere you look, graphs. If you manage to look somewhere and you don’t see a graph, then you may be looking at an opportunity to build one. Today we are going to do just that. We are going to make use of the new Neo4j Import tool to build a graph of the things that interest Hacker News.
Continue reading →

Tagged cypher, graph database, graph databases, hacker news, hcir, interest graph, neo4j, network, nosql, relationship graph, ruby, software, technology

Feb 27 2015

Caching Immutable Id lookups in Neo4j

If you’ve been following my blog for a while, you probably know I like using YourKit and Gatling for testing end to end requests in Neo4j. Today however we are going to do something a little different. We are going to be micro-benchmarking a very small piece of code within our Unmanaged Extension using a Java library called JMH.

Continue reading →

Tagged cache, graph database, neo4j, performance, testing

Jun 09 2014

1 Comment

Java, Random

Kickstarting a Neo4j Video Series

Learn how to build high performance @neo4j applications with this video training course.

I’m on Kickstarter to ask for your help in order to create a set of videos to teach you how to build high performance Neo4j applications. I am going to capture the lessons I’ve learned over the past 4 years working with graph databases and share them with you.

These videos will teach you everything you need to know about building high performance applications using Neo4j.
Continue reading →

Tagged graph, graph database, java, neo4j, performance, technology, testing

Feb 12 2014

8 Comments

Java, Problems, Random

Online Payment Risk Management with Neo4j

I really like this saying by Corey Lanum:

"Almost all fraud cases involve the fabrication of a relationship, so… model your data to highlight relationships" — @corey_lanum

— Max De Marzi is building RageDB (@maxdemarzi) November 21, 2013

Finding the relationships that should not be there is a great use case for Neo4j, and today I want to highlight an example of why. When you purchase something online, the merchant hands off your information to the payment gateway which processes your actual payment. Before they accept the transaction, they run it via series of risk management tests to validate that it is a real transaction and protect themselves from fraud. One of the hardest things for SQL based systems to do is cross check the incoming payment information against existing data looking for relationships that shouldn’t be there.
Continue reading →

Tagged credit cards, fraud, graph database, java, neo4j, network, payment, performance, risk

Dec 31 2013

3 Comments

Java, Problems, Random

The Power of Open Source Software

opensource-400

One of the benefits of Open Source Software is that if you want to change how something is done, you can. At Neo Technology, we have a small team of “Field Engineers” who don’t really work ON the product but rather WITH the product. We help our customers with issues of all kinds, answer questions, give suggestions and whatever we need to do to make people’s project successful. A little while back I had a support ticket for a traversal that was taking longer than they hoped it would.

Think about a social network, one of the things you may want to do is tell the user how big their friends network is. But why stop there? How about their friends of friends or even friends of friends of friends network? These are the kind of questions graph databases excel at compared to relational databases. Let’s take a look at what they were doing:
Continue reading →

Tagged github, graph database, java, neo4j, network, performance, testing

Max De Marzi

Graphs, Graphs, and nothing but the Graphs

Category Archives: Random

OUR OWN MULTI-MODEL DATABASE – PART 2

Our own Multi-Model Database – Part 1

Delivering a Graph Based Search solution to slightly wrong data

Benchmarks and Superchargers

Modeling Airline Flights in Neo4j

Importing the Hacker News Interest Graph

Caching Immutable Id lookups in Neo4j

Kickstarting a Neo4j Video Series

Online Payment Risk Management with Neo4j

The Power of Open Source Software