Tag Archives: neo4j

Knowledge Bases in Neo4j

cnet5promo

From the second we are born we are collecting a wealth of knowledge about the world. This knowledge is accumulated and interrelated inside our brains and it represents what we know. If we could export this knowledge and give it to a computer, it would look like ConceptNet. ConceptNet is a semantic network that…

…is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals.

Continue reading

Tagged , , , , , , , , ,

Permission Resolution with Neo4j – Part 3

write_automated_test2

Let’s add a couple of performance tests to the mix. We learned about Gatling in a previous blog post, we’re going to use it here again. The first test will randomly choose users and documents (from the graph we created in part 2) and write the results to a file, the second test will re-use the results of the first one and run consistently so we can change hardware, change Neo4j parameters, tune the JVM, etc. and see how they affect our performance.

The full code for the Random Permissions test is here, I’ll just highlight the main parts:
Continue reading

Tagged , , , , , , ,

Permission Resolution with Neo4j – Part 2

the-princess-bride-original3

Let’s try tackling something a little bigger. In Part 1 we created a small graph to test our permission resolution graph algorithm and it worked like a charm on our dozen or so nodes and edges. I don’t have fast hands, so instead of typing out a million node graph, we’ll build a graph generator and use the batch importer to load it into Neo4j. What I want to create is a set of files to feed to the batch-importer.
Continue reading

Tagged , , , , ,

Permission Resolution with Neo4j – Part 1

i_can_haz_permissions

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?
Continue reading

Tagged , , , , ,

A Peek behind the Neo4j Lucene Index Curtain

wizard-of-oz

Did you know you can write Javascript in the Neo4j console to access the Neo4j API?
Try it. Open up your Neo4j Web Admin Console and type:

neo4j-sh (0)$ eval db
EmbeddedGraphDatabase [data/graph.db]

OMG! I know, Neo4j is crazy. So much to play with, I’ve been at it for a few years and I haven’t even dug into this area. What else can we do here?
Continue reading

Tagged , , , , ,

Neo4j and Gatling sitting in a tree, Performance T-E-S-T-ing

neo4j_loves_gatling

I was introduced to the open-source performance testing tool Gatling a few months ago by Dustin Barnes and fell in love with it. It has an easy to use DSL, and even though I don’t know a lick of Scala, I was able to figure out how to use it. It creates pretty awesome graphics and takes care of a lot of work for you behind the scenes. They have great documentation and a pretty active google group where newbies and questions are welcomed.

It ships with Scala, so all you need to do is create your tests and use a command line to execute it. I’ll show you how to do a few basic things, like test that you have everything working, then we’ll create nodes and relationships, and then query those nodes.
Continue reading

Tagged , , , , , , ,

Setting up a Neo4j Cluster on Amazon

There are multiple ways to setup a Neo4j Cluster on Amazon Web Services (AWS) and I want to show you one way to do it.

Overview:

  1. Create a VPC
  2. Launch 1 Instance
  3. Install Neo4j HA
  4. Clone 2 Instances
  5. Configure the Instances
  6. Start the Coordinators
  7. Start the Neo4j Cluster
  8. Create 2 Load Balancers
  9. Next Steps

We’ll start off by logging on to Amazon Web Services and creating a Virtual Private Cloud:


Continue reading

Tagged , , , , , , ,

Pathfinding with Neo4j Unmanaged Extensions

In Extending Neo4j I showed you how to create an unmanaged extension to warm up the node and relationship caches. Let’s try doing something more interesting like exposing the A* (A Star) search algorithm through the REST API. The graph we created earlier looks like this:
Continue reading

Tagged , , , ,

Extending Neo4j

One of the great things about Neo4j is how easy it is to extend it. You can extend Neo4j with Plugins and Unmanaged Extensions. Two great examples of plugins are the Gremlin Plugin (which lets you use the Gremlin library with Neo4j) and the Spatial Plugin (which lets you perform spatial operations like searching for data within specified regions or within a specified distance of a point of interest).

Plugins are meant to extend the capabilities of the database, nodes, or relationships. Unmanaged extensions are meant to let you do anything you want. This great power comes with great responsibility, so be careful what you do here. David Montag cooked up an unmanaged extension template for us to use on github so lets give it a whirl. We are going to clone the project, compile it, download Neo4j, configure Neo4j to use the extension, test the extension and tweak it a bit.
Continue reading

Tagged , , , , , , , , , ,

CrunchBase on Neo4j

NeoTechnology was featured on TechCrunch after raising a Series B round, and it has an entry on CrunchBase. If you look at CrunchBase closely you’ll notice it’s a graph. Who invested in what, who co-invested, what are the common investment themes between investors, how are companies connected by board members, etc. These are questions we can ask of the graph and are well suited for graph databases.
Continue reading

Tagged , , , , , , ,