big data | Max De Marzi

Mar 20 2017

Searching for objects using multiple dimensions

Lets take a look at a scenario where you are trying to search for things by their attributes, not their description. They can be users, documents, or any object that could be described by discrete values in multiple dimensions. What does that mean exactly? Well, let me give you an example: searching for a dog. My family includes 2 four legged furry creatures named Tyler and Ronnie. They are my half lab, half golden retrievers. Dogs come in all shapes and sizes, from teacup breeds with adult weights around 5 lbs, to giant Mastiff breeds over 150 lbs. But most people don’t care exactly how much a dog weights, only their general size.

Continue reading →

Tagged big data, graph database, graph databases, neo4j, network, nosql, performance, relationship graph, software, technology

Dec 27 2016

1 Comment

Java, Problems

Connected

The Stereo MC’s song “Connected” could be about some recently gained insight and the realization that maybe some of the people you held dear are phonies and while the reality of the situation is scary, you cannot allow yourself to turn a blind eye anymore or allow yourself to backslide by disconnecting from the real world.

Or it could be a warning about how we’ve all been blinded by SQL databases for too long and we must instead look to connect our data with Graph Databases. About how those new connections may be scary (like because of fraud detection) but they are necessary to better understand reality.

Either way, we may want to see if two nodes in Neo4j are connected and I’m going to show you how to do that faster.
Continue reading →

Tagged big data, graph, graph database, graph databases, java, neo4j, performance, relationship graph, software, technology

Sep 28 2016

2 Comments

Java, Problems, Random

Delivering a Graph Based Search solution to slightly wrong data

When it comes to databases, having good clean data is always important. More so with Graphs which deal with concepts as nodes and their relationships between them. Inevitably, you will run into messy data and have to deal with it. In a lot of the projects our customers work on they are dealing with connecting multiple data sources to get to a “golden record” or single source of truth. A lofty goal, sometimes impossible to achieve, but we can use the relationships of the data to help us come close.

One option is to extract the features (or tags) of a composite object and see if any other object shares most of these features. If that is the case then they are possibly the same object and should be merged instead of creating a new record. A partial subgraph match is something akin to a recommendation engine in Neo4j and pretty trivial to write. Take a look back at a few old blog posts for ideas.
Continue reading →

Tagged big data, graph database, graph search, java, neo4j, network, performance, relationship graph, search, software, technology

Nov 25 2013

3 Comments

Deployment, Problems

Scaling Up

Rock climbing is a physically and mentally demanding sport, it test the limits of one’s strength, endurance, agility, balance and concentration. Sasha DiGiulian is one of the best rock climbers in the world. I can’t get past 15 feet without starting to panic and freak out. Maybe it’s because I’m afraid of heights…and overweight, but I’m just not right for that kind of challenge.
Continue reading →

Tagged big data, graph, graph database, neo4j, performance, testing

Aug 16 2012

2 Comments

Random

Getting a Big Neo4j Test Box for Cheap!

When embarking on a new Neo4j project, one of the things you have to figure out is where to run it. Most of the time the answer is just your laptop. Other times, using Heroku works great. However, if you are at the stage of your testing where you have billions of nodes and relationships, you need something a little bigger.

If you are not ready to commit to purchasing a 100k server for testing, then I suggest you borrow one for a short time. You can try to spin up an Amazon EC2 instance, the high memory large ones go up to 60 gigs of RAM. But what if you need more? Lots more?
Continue reading →

Tagged big data, graph database, neo4j

Jul 03 2012

1 Comment

Neography, Random

Graph Generator

Update: Code to this project is available on Github.

In the US Air Guitar Championships, competitors use their talents to fret on an “invisible” guitar to rock a live crowd and deliver a performance that transcends the imitation of a real guitar and becomes an art form in and of itself. The key factor that determines the winner is having the elusive quality of “Airness“. When considering using Neo4j in a project, one of the key considerations is having a domain model that yields itself to a graph representation. In other words, does your data have “Graphiness“. However, it didn’t dawn on me until recently that when starting a proof of concept, you probably don’t have that data (or enough of it) or maybe your security guys won’t let you within 100 miles of the company production data with this newfangled nosql thingamajig.
Continue reading →

Tagged big data, graph database, neo4j, ruby

Jul 02 2012

6 Comments

Java

Batch Importer – Part 3

At the end of February, we took a look at Michael Hunger’s Batch Importer. It is a great tool to load millions of nodes and relationships into Neo4j quickly. The only thing it was missing was Indexing… I say was, because I just submitted a pull request to add this feature. Let’s go through how it was done so you get an idea of what the Neo4j Batch Import API looks like, and in the next blog post I’ll show you how to generate data to take advantage of it.
Continue reading →

Tagged big data, github, graph database, java, neo4j

Feb 28 2012

32 Comments

Java

Batch Importer – Part 2

If you’ve been following along, we got Michael’s Batch Importer, compiled it, created some test data, ran it and saw millions of nodes and relationships loaded into Neo4j.

So now we’re ready for our own data. I am going to show you how to get data from a Relational Database like PostgreSQL into a format we can use. If you’re using SQL Server, MySQL, Oracle, etc, the directions will be slightly different, but you’ll get the picture.
Continue reading →

Tagged big data, graph database, java, neo4j

Feb 28 2012

46 Comments

Java

Batch Importer – Part 1

Data is everywhere… all around us, but sometimes the medium it is stored in can be a problem when analyzing it. Chances are you have a ton of data sitting around in a relational database in your current application… or you have begged, borrowed or scraped to get the data from somewhere and now you want to use Neo4j to find how this data is related.

Michael Hunger wrote a batch importer to load csv data quickly, but for some reason it hasn’t received a lot of love. We’re going to change that today and I’m going to walk you through getting your data out of tables and into nodes and edges.

Let’s clone the project and jump in.
Continue reading →

Tagged big data, github, graph database, java, neo4j, nosql

Max De Marzi

Graphs, Graphs, and nothing but the Graphs

Tag Archives: big data

Searching for objects using multiple dimensions

Connected

Delivering a Graph Based Search solution to slightly wrong data

Scaling Up

Getting a Big Neo4j Test Box for Cheap!

Graph Generator

Batch Importer – Part 3

Batch Importer – Part 2

Batch Importer – Part 1