Building a Twitter Clone with Neo4j – Part Six

We are getting close to wrapping up the back-end API for our Twitter clone, so thank you for sticking with this awfully long series since the beginning. One of the big community features of Twitter is the Trending Hashtags. It lets users know what is being talked about even if the people a user follows aren’t talking about it. It’s kind of weird in that way since part of the point of Twitter is following just a few hundred or thousand people to reduce the noise, and here we are bringing noise back in to our feed. Regardless, this is actually pretty easy to implement, so let’s have a crack at it.

If we take a look at our model again, when a Post is created, we add the tags by a “dated” relationship type. We can make use of this functionality to count how many Posts were given a tag on any day using getDegree:

 
    private static List<Map<String, Object>> getTrends(String key) {
        ArrayList<Map<String, Object>> results = new ArrayList<>();
        LocalDateTime dateTime = LocalDateTime.now(utc);
        RelationshipType tagged = RelationshipType.withName("TAGGED_ON_" +
                dateTime.format(dateFormatter));
        try (Transaction tx = db.beginTx()) {
            ResourceIterator<Node> tags = db.findNodes(Labels.Tag);
            while (tags.hasNext()) {
                Node tag = tags.next();
                int taggings = tag.getDegree(tagged, Direction.INCOMING);
                if ( taggings > 0) {
                    HashMap<String, Object> result = new HashMap<>();
                    result.put(NAME, tag.getProperty(NAME));
                    result.put(COUNT, taggings);
                    results.add(result);
                }
            }
            tx.success();
        }

        results.sort(Comparator.comparing(m -> (Integer) m.get(COUNT), reverseOrder()));
        return results.subList(0, Math.min(results.size(), 10));
    }

What’s neat about this way of finding trending tags is what we are not doing. Imagine this in a relational database, we’d have to do a range scan on the tagged_on date property and then perform a group by. Maybe we’re clever and have one “taggings” table per day so it doesn’t grow to immense proportions. In Neo4j we’re taking advantage of getDegree since it is pre-grouping counts for us on creation time.

Still, we’re having to look at the counts for all Tags, and we may not want to do that over and over again for every user on practically every screen. This is the part where the dreaded complexity of “caching” comes into the picture. I’ve seen production system using 4 layers of caching and the invalidation of that data must be a nightmare. To make our life easier, we’re going to cache right in the database. In Neo4j 3.1, we are bundling the “Caffeine” high performance caching library by Ben Manes.

 
    private static LoadingCache<String, List<Map<String, Object>>> trends = Caffeine.newBuilder()
            .expireAfterWrite(1, TimeUnit.DAYS)
            .refreshAfterWrite(5, TimeUnit.MINUTES)
            .build(Tags::getTrends);

We are using a LoadingCache that is only storing a single record which calls the getTrends method we created above. The results of getTrends will expire one day after it was last requested or updated. It will serve “stale” results while it recomputes up to every 5 minutes. Our HTTP method then becomes just requesting the results of this cache:

 
    @GET
    public Response getTrends(@Context GraphDatabaseService db) throws IOException {
        List<Map<String, Object>> results;
        try (Transaction tx = db.beginTx()) {
            results = trends.get("trends");
        }
        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Now, let’s move on to Search. We are going to want to cache search queries as well. I’m going to expire them after a day and serve stale results for up to a minute using a LoadingCache from Caffeine again.

 
    private static LoadingCache<String, ArrayList<Long>> searches = Caffeine.newBuilder()
            .maximumSize(50_000)
            .expireAfterWrite(1, TimeUnit.DAYS)
            .refreshAfterWrite(1, TimeUnit.MINUTES)
            .build(Search::performSearch);

Now… for strange reasons, we do not have direct access to query the Schema Indexes in Neo4j via the Java API. So we could perform a Cypher query here but instead I’m going to cheat and use some black magic to get access to the lower level Neo4j APIs and search for a term using “containsString”:

 
    private static ArrayList<Long> performSearch(String term) throws SchemaRuleNotFoundException, IndexNotFoundKernelException {
        ThreadToStatementContextBridge ctx = dbapi.getDependencyResolver().resolveDependency(ThreadToStatementContextBridge.class);
        KernelStatement st = (KernelStatement)ctx.get();
        ReadOperations ops = st.readOperations();
        IndexDescriptor descriptor = ops.indexGetForLabelAndPropertyKey(postLabelId, statusPropertyId);
        IndexReader reader = st.getStoreStatement().getIndexReader(descriptor);

        PrimitiveLongIterator hits = reader.containsString(term);
        ArrayList<Long> results = new ArrayList<>();
        while(hits.hasNext()) {
            results.add(hits.next());
        }
        return results;
    }

This part gets a bit tricky because we are caching results of the search, but we can’t cache how the results are displayed to a User. A Post returned in our search results may have been liked or reposted by the User searching or by other users. To account for that you see above we are only caching the node ids of the search results in an ArrayList of Longs. We need to “hydrate” these node ids into Nodes and then perform more work. We need to be able to paginate, and then get additional data for each result. The http request is a bit long and slightly messy.

 
   @GET
    public Response getSearch(@QueryParam("q") final String q,
                              @QueryParam("limit") @DefaultValue("25") final Integer limit,
                              @QueryParam("since") final Long since,
                              @QueryParam("username") final String username,
                              @Context GraphDatabaseService db) throws SchemaRuleNotFoundException, IndexNotFoundKernelException, NoSuchMethodException, IOException {
        ArrayList<Map<String, Object>> results = new ArrayList<>();
        Long latest = getLatestTime(since);

        try (Transaction tx = db.beginTx()) {
            final Node user = Users.findUser(username, db);
            ArrayList<Long> postIds = searches.get(q);
            Queue<Node> posts = new PriorityQueue<>(Comparator.comparing(m -> (Long) m.getProperty(TIME), reverseOrder()));

            postIds.forEach(postId -> {
                Node post = db.getNodeById(postId);
                Long time = (Long)post.getProperty("time");
                if(time < latest) {
                    posts.add(post);
                }
            });

            int count = 0;
            while (count < limit && !posts.isEmpty()) {
                count++;
                Node post = posts.poll();
                Map<String, Object> properties = post.getAllProperties();
                Node author = getAuthor(post, (Long) properties.get(TIME));

                properties.put(USERNAME, author.getProperty(USERNAME));
                properties.put(NAME, author.getProperty(NAME));
                properties.put(HASH, author.getProperty(HASH));
                properties.put(LIKES, post.getDegree(RelationshipTypes.LIKES));
                properties.put(REPOSTS, post.getDegree(Direction.INCOMING)
                                        - 1 // for the Posted Relationship Type
                                        - post.getDegree(RelationshipTypes.LIKES)
                                        - post.getDegree(RelationshipTypes.REPLIED_TO));
                if (user != null) {
                    properties.put(LIKED, userLikesPost(user, post));
                    properties.put(REPOSTED, userRepostedPost(user, post));
                }
                results.add(properties);
            }
        }

        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Alright, I think this is the last one. We need a way to see recent Posts in order to find people to follow. We could index the “time” property of Posts and search for nodes that were created recently, that would be the traditional way of doing it. However let me show you another way. We are going to ask the graph to tell us the highest node id in use. Then we walk that number down checking as we go if the node id belongs to a Post. Instead of ignoring non-Post nodes we could show newly created Tags and newly registered Users, but for now we will keep it simple and wrap up this API.

 
       ...
       try (Transaction tx = db.beginTx()) {
            final Node user = Users.findUser(username, db);

            RecordStorageEngine recordStorageEngine = dbapi.getDependencyResolver()
                    .resolveDependency(RecordStorageEngine.class);
            StoreAccess storeAccess = new StoreAccess(recordStorageEngine.testAccessNeoStores());
            Long highId = storeAccess.getRawNeoStores().getNodeStore().getHighestPossibleIdInUse();

            int counter = 0;
            while (counter < limit && highId > -1) {
                Node post = db.getNodeById(highId);
                highId--;
                if (post.getLabels().iterator().next().name().equals(Labels.Post.name())) {
                    Long time = (Long) post.getProperty("time");
                    if (time < latest) {
                    ...

The source code is on github. I didn’t have space and time to show you all the tests for these methods, but you can find them in the test folder of the code. Even managed to maintain almost 95% test code coverage. We’re going to take a little break, do a few other things and then start with the front end of our Twitter clone. Stay tuned for more blog posts soon. On to part 7.

Tagged , , , , , , ,

2 thoughts on “Building a Twitter Clone with Neo4j – Part Six

  1. […] we’ve had enough back-end work on our Twitter Clone. Let’s switch gears and get to work on the front end. I’ve […]

  2. […] …and that is how you build the back-end data service of a Twitter clone with Neo4j using Extensions to the existing Neo4j REST API. We’ll look at building our front end in an upcoming post. On to part 6. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: