Building a Twitter Clone with Neo4j – Part Four

We left off last time having just added the ability to follow people, see who we’ve followed and has followed us, block and unblock people and finally see whom we have put on our naughty list of blocked users. So we have a social network where people can create relationships, but they have nothing to say because we haven’t implemented that yet!

We’ll begin with createPost. The input is coming in as a JSON blob so off it goes to get validated. Assuming it validates, we get the time and go off searching for the user who created the post. We will then create the post, I’ve moved that off to another method that I’ll show below. We’ll grab the properties of the post, and add a few properties from the author so it can be displayed properly. I’m cheating here a bit and hard-coding zeros for REPOSTS and LIKES because it was just created, there is no way somebody saw it and acted on it yet.

    @POST
    public Response createPost(String body, @PathParam("username") final String username,
                               @Context GraphDatabaseService db) throws IOException {
        Map<String, Object> results;
        HashMap<String, Object> input = PostValidator.validate(body);
        LocalDateTime dateTime = LocalDateTime.now(utc);

        try (Transaction tx = db.beginTx()) {
            Node user = Users.findUser(username, db);
            Node post = createPost(db, input, user, dateTime);
            results = post.getAllProperties();
            results.put(USERNAME, username);
            results.put(NAME, user.getProperty(NAME));
            results.put(HASH, user.getProperty(HASH));
            results.put(REPOSTS, 0);
            results.put(LIKES, 0);
            tx.success();
        }
        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

The actual createPost method is setting our status and time, connecting the post to the user by the dated “POSTED_ON_YYYY_MM_DD” relationship type as well as adding the current time to the relationship itself. You’ll also see this is the time to deal with Tags and Mentions in the post, which we can take a look at next.

    private Node createPost(@Context GraphDatabaseService db, HashMap input, Node user, LocalDateTime dateTime) {
        Node post = db.createNode(Labels.Post);
        post.setProperty(STATUS, input.get("status"));
        post.setProperty(TIME, dateTime.toEpochSecond(ZoneOffset.UTC));
        Relationship r1 = user.createRelationshipTo(post, RelationshipType.withName("POSTED_ON_" +
                        dateTime.format(dateFormatter)));
        r1.setProperty(TIME, dateTime.toEpochSecond(ZoneOffset.UTC));
        Tags.createTags(post, input, dateTime, db);
        Mentions.createMentions(post, input, dateTime, db);
        return post;
    }

Our createTags method is looking for any strings that match the patter “#something”. We take the input, bring it down to lower case and then for each match in the pattern we either find or create the tag node and then connect it to the Post. We keep track of tags, so if a user posts the same tag twice, we only add it once.

    private static final Pattern hashtagPattern = Pattern.compile("#(\\S+)");

    public static void createTags(Node post, HashMap<String, Object>  input, LocalDateTime dateTime, GraphDatabaseService db) {
        Matcher mat = hashtagPattern.matcher(((String)input.get("status")).toLowerCase());
        for (Relationship r1 : post.getRelationships(Direction.OUTGOING,
                                    RelationshipType.withName("TAGGED_ON_" +
                dateTime.format(dateFormatter)))) {
            r1.delete();
        }
        Set<Node> tagged = new HashSet<>();
        while (mat.find()) {
            String tag = mat.group(1);
            Node hashtag = db.findNode(Labels.Tag, NAME, tag);
            if (hashtag == null) {
                hashtag = db.createNode(Labels.Tag);
                hashtag.setProperty(NAME, tag);
                hashtag.setProperty(TIME, dateTime.truncatedTo(ChronoUnit.DAYS).toEpochSecond(ZoneOffset.UTC));
            }
            if (!tagged.contains(hashtag)) {
                post.createRelationshipTo(hashtag, RelationshipType.withName("TAGGED_ON_" +
                        dateTime.format(dateFormatter)));
                tagged.add(hashtag);
            }
        }
    }

You’ll notice I am using a “dated” relationship type for TAGGED_ON? This is because I want to be able to find recently tagged posts from a Tag. One neat thing about having the time on the Post node, is that I can use that to know which dated TAGGED_ON relationship to take without having to try them all. The Tag itself got a time as well, but we truncated it to the earliest time in the day. We can use this when looking at a Tag page, by traversing TAGGED_ON_ relationships only as far back in the past as the day of the tag creation.

…and as I write this I just realized I forgot to add a time property to User nodes which would help me stop trying to find POSTED_ON relationships created before the User. We will also use this trick in our Timeline to not get Posts created before the User was created.

Alright what about Mentions?

Well it is the same idea as Tags, except we are looking for “@” and the user has to actually exist. Our MENTIONS_ON relationship will also be dated since Users will want to know who mentioned them recently. We are keeping track of mentions to make sure we don’t create two relationships if a user is mentioned twice in one post.

    private static final Pattern mentionsPattern = Pattern.compile("@(\\S+)");

    public static void createMentions(Node post, HashMap<String, Object> input, LocalDateTime dateTime, GraphDatabaseService db) {
        Matcher mat = mentionsPattern.matcher(((String)input.get("status")).toLowerCase());

        for (Relationship r1 : post.getRelationships(Direction.OUTGOING, RelationshipType.withName("MENTIONED_ON_" +
                dateTime.format(dateFormatter)))) {
            r1.delete();
        }

        Set<Node> mentioned = new HashSet<>();
        while (mat.find()) {
            String username = mat.group(1);
            Node user = db.findNode(Labels.User, USERNAME, username);
            if (user != null && !mentioned.contains(user)) {
                Relationship r1 = post.createRelationshipTo(user, RelationshipType.withName("MENTIONED_ON_" +
                        dateTime.format(dateFormatter)));
                r1.setProperty(TIME, dateTime.toEpochSecond(ZoneOffset.UTC));
                mentioned.add(user);
            }
        }
    }

Alright, moving right along. I also need to be able to see the posts users have created. Like before we will use limit and since to paginate through our getPosts method. I’ll find the user first and get their “born on date” so we can stop looking for any Posts earlier than that. Then while under our limit of post, we will continuously try going backwards in time by using the dated POSTED_ON relationship types until we get our fill or run “out of time”. Very similarly to what we did in getProfile, I’ll grab some metrics of our Post using get getDegree to show the number of Likes, and Reposts.

    @GET
    public Response getPosts(@PathParam("username") final String username,
                             @QueryParam("limit") @DefaultValue("25") final Integer limit,
                             @QueryParam("since") final Long since,
                             @Context GraphDatabaseService db) throws IOException {
        ArrayList<Map<String, Object>> results = new ArrayList<>();
        LocalDateTime dateTime;
        if (since == null) {
            dateTime = LocalDateTime.now(utc);
        } else {
            dateTime = LocalDateTime.ofEpochSecond(since, 0, ZoneOffset.UTC);
        }
        Long latest = dateTime.toEpochSecond(ZoneOffset.UTC);

        try (Transaction tx = db.beginTx()) {
            Node user = Users.findUser(username, db);
            Map userProperties = user.getAllProperties();
            LocalDateTime earliest = LocalDateTime.ofEpochSecond((Long)userProperties.get(TIME), 0, ZoneOffset.UTC);
            int count = 0;
            while (count < limit && (dateTime.isAfter(earliest))) {
                RelationshipType relType = RelationshipType.withName("POSTED_ON_" +
                        dateTime.format(dateFormatter));

                for (Relationship r1 : user.getRelationships(Direction.OUTGOING, relType)) {
                    Node post = r1.getEndNode();
                    Map<String, Object> result = post.getAllProperties();
                    Long time = (Long)r1.getProperty("time");
                    if(time < latest) {
                        result.put(TIME, time);
                        result.put(USERNAME, username);
                        result.put(NAME, userProperties.get(NAME));
                        result.put(HASH, userProperties.get(HASH));
                        result.put(LIKES, post.getDegree(RelationshipTypes.LIKES));
                        result.put(REPOSTS, post.getDegree(Direction.INCOMING)
                                - 1 // for the Posted Relationship Type
                                - post.getDegree(RelationshipTypes.LIKES)
                                - post.getDegree(RelationshipTypes.REPLIED_TO));

                        results.add(result);
                        count++;
                    }
                }
                dateTime = dateTime.minusDays(1);
            }
            tx.success();
        }

        results.sort(Comparator.comparing(m -> (Long) m.get(TIME), reverseOrder()));

        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Allright… now one of the moments we’ve all been waiting for… but first let’s hear the setup:

https://twitter.com/dtunkelang/status/848613528919396352

If you’ve been paying attention since part one, you know the reason. Every time a user tweets, their tweet has to be added to the timelines of everyone who is following them. In the case of celebrities this is millions of writes. If they are allowed to edit their writes, it’s going to make a giant mess plus inconsistencies since some people will see the old post and others the new one until the millions of writes converge.

But Neo4j doesn’t care because we don’t do that. Every post is picked up on a users timeline in real time, so if it gets edited, everyone seeing their timeline from then on will see the new version of the post. That is a huge advantage Neo4j has over other database systems when building social networks or any kind of subgraph generated content.

To update our posts, we will find the user doing the updating, validate our new post is legit, then find the post we are updating (more on that in a second) and proceed as with the createPost method.

    @PUT
    @Path("/{time}")
    public Response updatePost(String body,
                               @PathParam("username") final String username,
                               @PathParam("time") final Long time,
                               @Context GraphDatabaseService db) throws IOException {
        Map<String, Object> results;
        HashMap<String, Object> input = PostValidator.validate(body);

        try (Transaction tx = db.beginTx()) {
            Node user = Users.findUser(username, db);
            Node post = getPost(user, time);
            post.setProperty(STATUS, input.get(STATUS));
            LocalDateTime dateTime = LocalDateTime.ofEpochSecond(
                                           (Long)post.getProperty(TIME), 0, ZoneOffset.UTC);
            Tags.createTags(post, input, dateTime, db);
            Mentions.createMentions(post, input, dateTime, db);

            results = post.getAllProperties();
            results.put(USERNAME, username);
            results.put(NAME, user.getProperty(NAME));
            results.put(LIKES, post.getDegree(RelationshipTypes.LIKES));
            results.put(REPOSTS, post.getDegree(Direction.INCOMING)
                    - 1 // for the Posted Relationship Type
                    - post.getDegree(RelationshipTypes.LIKES)
                    - post.getDegree(RelationshipTypes.REPLIED_TO));
            tx.success();
        }
        return Response.ok().entity(objectMapper.writeValueAsString(results)).build();
    }

Our getPost method combs through all the posts posted on the same day as the post we are editing until it finds it. Most people don’t tweet too much on one day, unless they are live tweeting some kind of event or having a tweet storm, so this isn’t too big of a deal.

    public static Node getPost(Node author, Long time) {
        LocalDateTime postedDateTime = LocalDateTime.ofEpochSecond(time, 0, ZoneOffset.UTC);
        RelationshipType original = RelationshipType.withName("POSTED_ON_" +
                postedDateTime.format(dateFormatter));
        Node post = null;
        for(Relationship r1 : author.getRelationships(Direction.OUTGOING, original)) {
            Node potential = r1.getEndNode();
            if (time.equals(potential.getProperty(TIME))) {
                post = potential;
                break;
            }
        }
        if(post == null) { throw PostExceptions.postNotFound; };

        return post;
    }

You might have gotten the hint this was coming since our createTags and createMentions methods both removed any pre-existing relationships. Deleting nodes and relationships used to leave holes in the Neo4j storage engine until the server was rebooted and the space reclaimed with new nodes and relationships. That is no longer the case. As these mentioned and tagged relationships get deleted, they will be replaced with new relationships without the need of a restart in newer versions of Neo4j.

We will call it good for today, but stay tuned for more next time as we add the likes, repost and timeline functionality. On to part 5.

Tagged , , , , , , , ,

4 thoughts on “Building a Twitter Clone with Neo4j – Part Four

  1. […] part four, we continued cloning Twitter by adding hashtag and mentions functionality. Then we went beyond it […]

  2. […] We will call it good for today, but stay tuned for more next time as we add the post functionality. On to part 4. […]

  3. Evangelos says:

    Hi,

    Congrats for the nice examples! They are very useful.

    Based on the schema you defined, every post is a node. Today there are more than 150 billion Tweets.
    Do you think Neo4j can handle numbers like that?

    • maxdemarzi says:

      Archimedes is quoted as saying:

      “Give me the place to stand, and I shall move the earth.”

      So I will paraphrase… “Given the hardware, Neo4j shall host Twitter.”

      But really, there is no need to put everything on one giant graph or single server. You could have one graph of all the follow relationships, and you could split the tweet graph by day and make a few queries to complete each request.

      …gonna need some fast drives to handle the peak write loads though.

Leave a comment