Stored Procedures in Neo4j Are NOT Evil

Some of you may be too young to remember this, but writing your application logic in Stored Procedures was all the rage back in the day (PG devs you wouldn’t understand). That is mostly because they were typically written by consultants from IBM and Oracle that cost a fortune and told you to do it this way. Eventually people wised up and realized they (both) sucked and as all things in Software Development, the pendulum swung the other way and people started treating databases like dumb storage. A 15 year old blog post from Jeff Atwood lists some of the problems:

Written in T-SQL/PL-SQL, not a real programming language.
Cannot be debugged in the same IDE
Little to no feedback when things go wrong
Can’t pass objects.
No idea what a proc is doing

Luckily for us, things are a little bit different in Neo4j. We can write our stored procedures in Java (or any language on the JVM). You can use IntelliJ or whatever IDE that supports Java to build your application and your stored procedure. You have the ability to log errors or return errors if you wish. You can pass maps of objects as parameters to Neo4j stored procedures. You have full access to a debugger to walk through the stored procedure line by line.

We already have an example of building an app using Unmanaged Extensions, and I walked you through updating it to work on 4.0. We have plenty of examples on how to build apps using Cypher… we don’t have any examples of using stored procedures to build an app in 4.0… so let’s do that. The quick can follow along and the brave can jump ahead with this github repository which has the code written so far.

Let’s start with our pom.xml file. The following are the current Neo4j versions, we’ll put them here so we can update them easily as things progress.

 
    <properties>
        <neo4j.version>4.0.3</neo4j.version>
        <driver.version>4.0.1</driver.version>
        <junit-jupiter.version>5.5.2</junit-jupiter.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

First thing we need is our Neo4j dependency. It’s scope is “provided” since the stored procedures will be run directly on the Neo4j servers, they have everything they need there.

 
        <dependency>
            <groupId>org.neo4j</groupId>
            <artifactId>neo4j</artifactId>
            <version>${neo4j.version}</version>
            <scope>provided</scope>
        </dependency>

We are going to be testing our stored procedures as we go because even thought this is just a sample application I want you to do as I do. To that end, add this test-harness in the “test” scope and we’ll have what we need.

 
        <dependency>
            <groupId>org.neo4j.test</groupId>
            <artifactId>neo4j-harness</artifactId>
            <version>${neo4j.version}</version>
            <scope>test</scope>
        </dependency>

We will use the harness to create in memory instances of Neo4j to run queries against… but to actually send the queries, we need the driver, so let’s bring that in as well.

 
        <dependency>
            <groupId>org.neo4j.driver</groupId>
            <artifactId>neo4j-java-driver</artifactId>
            <version>${driver.version}</version>
            <scope>test</scope>
        </dependency>

Lastly, we’ll use jUnit to run our tests:

 
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter-engine</artifactId>
            <version>${junit-jupiter.version}</version>
            <scope>test</scope>
        </dependency>

That was pretty ugly, but you only have to do this once and you can copy and paste to every new project. Now let’s get on with it. We’ll create a package “me.tucu” because that’s the convention which doesn’t quite work when you use the domain extension as part of your name, but whatever. Typically I like to start with the Schema, so I tend to create a sub package for that and write the Labels of our graph. I keep my enums in alphabetical order because I’m not a heretic and so should you. We have Conversations where our Users will be able to talk privately with each other, more on that in another post. We have Posts, Products, Users and Tags. We may end up with more along the way. I had a brief moment where I thought maybe I should have a different Label for Post that promote Products… call them “Advertisements” or something, but then thought better of it. Try to stick to a single Label per Node if you can.

 
public enum Labels implements Label {
    Conversation,
    Post,
    Product,
    User,
    Tag
}

Speaking of Promotes, we need relationship types. Most of these are self explanatory with the exception of MUTES. One thing I’ve noticed lately on Twitter is that people tend to be “proud” of getting blocked by some users. They take a picture of the blocked message and see it as a badge of honor of some kind. That sounds like behavior we don’t want, so instead we will MUTE users where they will silently not bother people. I’m thinking about expanding MUTES so that it mutes people to all the people that FOLLOW you as well. So it has more of a social consequence beyond one User. What do you think? Maybe we can try that.

 
public enum RelationshipTypes implements RelationshipType {
    FOLLOWS,
    LIKES,
    MUTES,
    PROMOTES,
    REPLIED_TO,
    REPOSTED
}

Notice above we don’t have any of the “dated relationships” because these will be dynamically created so no point in making a shortcut for them. I also like to keep the names of my properties in a utility class so I don’t make any typos. You know this is kinda neurotic but it has burned my ass a few times, so now I just do it out of habit.

 
public final class Properties {

    private Properties() {
        throw new IllegalAccessError("Utility class");
    }
    public static final String EMAIL = "email";
    public static final String GOLD = "gold";
    public static final String ID = "id";
    ...

Alright onto our first stored procedure which is actually creating the Schema of our graph. This is where I stick all the Constraints and Indexes I will need in the graph. So let’s create a Schema class and add the graph database in the Context when it this stored procedure gets invoked. If we have any errors we will want to log them, so we’ll grab that for our context too.

 
public class Schema {

    @Context
    public GraphDatabaseService db;

    @Context
    public Log log;

A stored procedure needs two things. One is a name and the other is a mode. There are four modes: Read, Write, Schema and DBMS. The first two are self explanatory, Schema mode can read but it is really there to let you add and remote indexes and constraints. The last one is for user and query management, it doesn’t have access to the graph data.

 
    @Procedure(name = "me.tucu.schema.create", mode = Mode.SCHEMA)
    @Description("CALL me.tucu.schema.create() - create schema")
    public Stream<StringResult> create() {
        ArrayList<String> results = new ArrayList<>();

Our procedure is going to return a Stream of String Results. We’ll need to create that as well. I create a results package and stick them all in there. You can pretty much use the APOC result types as examples. Just copy and paste the ones you need.

 
public class StringResult {
    public final static StringResult EMPTY = new StringResult(null);

    public final String value;

    public StringResult(String value) {
        this.value = value;
    }
}

Back to our Schema create procedure. Here we grab a transaction in a try block. I check to see if I already created any indexes or constraints for the Label I am interested in, and if I haven’t yet, then I go ahead and create them. Let me give you an example of a Constraint for User. We know “usernames” have to be unique in the system, so we’ll need to create a constraint for them.

 
        try (Transaction tx = db.beginTx()) {
            org.neo4j.graphdb.schema.Schema schema = tx.schema();
            if (!schema.getConstraints(Labels.User).iterator().hasNext()) {
                schema.constraintFor(Labels.User)
                        .assertPropertyIsUnique(USERNAME)
                        .create();
                tx.commit();
                results.add("(:User {username}) constraint created");
            }
        }

Now an example of an index for Post. It’s pretty cool how you can use the “.on(String)” method twice for a composite index.

 
        // We have a need to quickly find out if a user Re-advertises an Advertisement
        try (Transaction tx = db.beginTx()) {
            org.neo4j.graphdb.schema.Schema schema = tx.schema();
            if (!schema.getIndexes(Labels.Post).iterator().hasNext()) {
                schema.indexFor(Labels.Post).on(USERNAME).on(POST_ID).create();
                tx.commit();
                results.add("(:Post {username, post_id}) index created");
            }
        }

In this case we want to find a Post that is really a Repost that is really a Repost of an Advertisement. We want to see if a User has already created a Repost and reject their request to Repost it again. What we do is make use of composite indexes and create an index on both the “username” and “post_id” property, where we will stick the user that created this Repost and the original post id of the Post that advertised a Product.

You may be wondering… how did I know I was going to need this index already when I haven’t really figured out what the front end is going to look like and what my queries will be? Fair point. When you build your apps I want you to write out the important queries so you can make sure your model will work with them. Take my modeling class, we go over this in detail then. However we kinda already wrote half this app when we cloned Twitter, so I already knew I needed to know if a User has already reposted something or not.

In the old app, it was a matter of checking a direct relationship between the User and the Post they REPOSTED, but since I need to build a “repost tree” instead of using a repost relationship for “Advertisements” it would be easier to add the post_id and user_id properties to the Repost and get to them quickly in an index. It will make more sense when we get to that part of the code. The stored procedure ends by returning a stream of String Results.

 
return results.stream().map(StringResult::new);

Let’s make sure this works. In our “test” directory we will create another “me.tucu” package and create a SchemaTest class that looks like this:

 
public class SchemaTests {

    private static Neo4j neo4j;

    @BeforeAll
    static void initialize() {
        neo4j = Neo4jBuilders.newInProcessBuilder()
                // disabling http server to speed up start
                .withDisabledServer()
                .withProcedure(Schema.class)
                .build();
    }

Here we are building an in-memory neo4j, disabling a few things to speed it starting up, adding the Schema procedure we created to it and building it. Now we are ready to make our first test “shouldCreateSchema”. It gets a Driver in a try block from the server uri and without any encryption. Then it creates a session and runs the cypher that calls the stored procedure into a Result. Then we test the result. I’m being lazy here and just testing the size of the list returned is the current number of constraints and indexes I create. Once the app is more solid, I’ll write the actual list of things I expect and compare against that. We’ll see some of that in later posts.

 
    @Test
    void shouldCreateSchema()
    {
        // In a try-block, to make sure we close the driver after the test
        try( Driver driver = GraphDatabase.driver( neo4j.boltURI() , Config.builder().withoutEncryption().build() ) )
        {
            // Given I've started Neo4j with the procedure
            //       which my 'neo4j' rule above does.
            Session session = driver.session();

            // When I use the procedure
            Result result = session.run( "CALL me.tucu.schema.create();");

            // Then I should get what I expect
            assertThat(result.list().size(), equalTo(4));
        }
    }

Now we can run our test and make sure it works. If there are any issues we can jump into the debugger and walk line by line to see what’s going on. Working with stored procedures in Neo4j is just like working with any other Java code. We’ll stop here and start up again next time. I’m still undecided if we should do a full walkthrough of most of the lines of code or just highlight the “interest parts” (well as interesting as Java code can be). Let me a comment with your thoughts and if you missed it, here is the code so far. On to the next blog post.

Max De Marzi

Graphs, Graphs, and nothing but the Graphs