Connected

connected

Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives is a mind bending look at how no matter how individual we think we are, the people around us have a great amount of influence in our lives. One of the authors James Fowler was at GraphConnect 2012 and gave a presentation on this idea:

Stop and reflect on slide 12. This slide is showing us that people 3 degrees out from us have a measurable influence in our lives. Think about that for a second. We are being influenced by countless people we don’t even know… and what’s more, we ourselves are influencing people we don’t even know. Kinda makes you want to be a better person, and set a good example no?

In graph theory, connected components are subgraphs without links between them in a larger graph. You can see 3 connected components in the image below:

200px-Pseudoforest

When a social network is just beginning, it may have many connected components that eventually merge into one. I remember this being the case years ago on Linked In. One of the goals upon registering on the site, was to connect to the right people so you could also be part of the largest component. Nowadays, the social networks that we experience online are very dense, but what isn’t is the content that is generated on those sites. If you upload a baby picture on Facebook, some of your friends may comment on it, some of them may say they “like” it, some may scroll right past it in their feed, but nonetheless that baby picture and the people that interacted with it are a connected component of a larger graph. While your baby picture may not spread far, some “viral” content does. For example shocking or newsworthy tweets on Twitter or funny panda videos on Youtube, even e-mail chains on your company’s Exchange Server.

With the property graph model, we can choose which relationships to traverse to see how our graph is split into separate components. Let’s see how we can find all the connected components of a specific relationship type on Neo4j:

@Path("/service")
public class MyService {

    @GET
    @Path("/cc/{name}")
    public String getConnectedComponentsCount(@PathParam("name") String name, @Context GraphDatabaseService db) throws IOException {
        int CCid = 0;
        for ( Node n : GlobalGraphOperations.at( db ).getAllNodes() ) {
            if(!n.hasProperty("CCId")) {
                Transaction tx = db.beginTx();
                try {
                    Traverser traverser = Traversal.description()
                            .breadthFirst()
                            .relationships(DynamicRelationshipType.withName(name), Direction.BOTH)
                            .evaluator(Evaluators.excludeStartPosition())
                            .uniqueness(Uniqueness.NODE_GLOBAL)
                            .traverse(n);
                    int currentCCid = CCid;
                    CCid++;
                    n.setProperty("CCId", currentCCid);
                    for ( org.neo4j.graphdb.Path p : traverser )
                    {
                        p.endNode().setProperty("CCId", currentCCid);
                    }
                    tx.success();
                }
                catch ( Exception e )
                {
                    tx.failure();
                }
                finally
                {
                    tx.finish();
                }
            }
        }
        return String.valueOf(CCid);
    }
}

Above, we are using an Unmanaged Extension that takes the name of a relationship type. For every node in the graph, we check to see if it does not have a “CCId” property, and if so we use it as a starting point in our graph traversing as far as we can with that relationship type, marking all the nodes we reach with the same “CCId”. Then we check the next node in the graph and do the same until all nodes have been checked. Lastly we return the last “CCId” we used, just to know how many there were in total. The complete source code is available on github.com. This is just a simple example. In your application you may want to pass in multiple relationships, or only look at the nodes from a certain index instead of all of them. Do not try this at home on your relational database… anything but a trivial example is going to blow up on you.

If you want to learn more about Connected, you can watch a video by the book’s second author Nicholas Christakis below:

Also, join us at GraphConnect 2013!

GC-node-250x250

Tagged , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: