Counting Nodes with Multiple Labels

We have over 6000 users in our #neo4j-users slack channel and get all kinds of requests. About a month ago Thomas Shields asked:

Should counting the set of things with 2 labels really take so long? I’ve got 48M nodes with LabelA and LabelB and the query `MATCH (n:LabelA:LabelB) RETURN COUNT(n)` is taking 80-90 seconds

Let’s see what’s going on by creating a small version of his graph. We will create 1M nodes of LabelA, then 1M nodes with both LabelA and LabelB, and then 1M nodes with just Label B:

 
WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
FOREACH (r IN range(0,1000000) | CREATE (:LabelA {username:names[r % size(names)]+r}))

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
FOREACH (r IN range(0,1000000) | CREATE (:LabelA:LabelB {username:names[r % size(names)]+r}))

WITH ["Jennifer","Michelle","Tanya","Julie","Christie","Sophie","Amanda","Khloe","Sarah","Kaylee"] AS names 
FOREACH (r IN range(0,1000000) | CREATE (:LabelB {username:names[r % size(names)]+r}))

When we try to profile the query we see the execution plan:

This is taking about 240ms with our simple dataset. So we are going to try to speed it up by building a cypher stored procedure. We will make it dynamic so we can take two or more labels. We’ll use my favorite library Roaring Bitmap, and create an array of them. Then for each Label, we will add the nodeId to the corresponding bitmap, and at the end we will aggregate them all and get the cardinality to get the count. The whole procedure is below:

 
    @Procedure(name = "com.maxdemarzi.multiple_label_count", mode = Mode.DEFAULT)
    @Description("CALL com.maxdemarzi.multiple_label_count([labels]")
    public Stream<LongResult> MultipleLabelCount(@Name("labels") List<String> labels) {
        RoaringBitmap[] bitmaps = new RoaringBitmap[labels.size()];

        int count = 0;
        for (String label : labels) {
            int finalCount = count;
            RoaringBitmap bitmap = new RoaringBitmap();
            bitmaps[count] = bitmap;
            db.findNodes(Label.label(label)).stream().forEach(node -> bitmaps[finalCount].add(((int) node.getId())));
            count++;
        }

        return Stream.of(new LongResult(FastAggregation.and(bitmaps).getLongCardinality()));
    }

We will compile this procedure, add the jar to the plugins folder of Neo4j, and allow it by adding the following line to the neo4j.conf file:

 
dbms.security.procedures.unrestricted=com.maxdemarzi.*

Now we restart the database and call the procedure:

 
CALL com.maxdemarzi.multiple_label_count(['LabelA','LabelB']) YIELD value RETURN value

We get the same result and our time is down to about 90ms from 240ms. Thomas built his own version for just two labels and was able to get his performance from 90 seconds down to 10 seconds, for almost an order of magnitude improvement.

The source code as always is on github. Feel free to grab it if you need something like it.

If you run into trouble with Neo4j, please join our user-slack channel and give us a chance to help you. We have dedicated Neo4j employees answering questions as well as our awesome community members helping one another. There are dozens of channels, anything from #help-newbies for basic questions, to #lang-fr for French speakers, #neo4j-drupal for old school PHP developers, #neo4j-graphql for bleeding edge hipster developers and everything in between.

Stop by and say hello!

Tagged , , , , , ,

2 thoughts on “Counting Nodes with Multiple Labels

  1. […] De Marzi wrote a blog post in which he shows how to write a stored procedure which uses the Roaring Bitmap library to do fast […]

  2. KRZYSZTOF STENCEL says:

    I don’t get it. Do you propose to use an imperative code instead of declarative Cypher? It looks like a step backwards.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: