Finding Fraud Part Two

In the last blog post, we saw how we can use Neo4j to find the merchants where credit card fraud originated or was used for testing stolen data in order to prevent further fraudulent charges. It stemmed from a webinar on our amazing youtube channel with has hundreds of videos about graphs and Neo4j. We will continue diving in to the technical details by looking at how Neo4j can help you find Fraud Rings. The way this fraud works is that a large set of synthetic accounts are created and act like normal customers. Over time they request higher and higher levels of credit which they pay back on time. Then they all request the maximum credit they can get, take out the money, and disappear! Let’s find them before this happens.

We’ll start by having a bunch of customers. We will reuse John, Sheila and Karen from the last blog post:

 
CREATE (john:User {name:"John"})
CREATE (sheila:User {name:"Sheila"})
CREATE (karen:User {name:"Karen"})

Our graph looks like this so far:

But a few unconnected nodes are not much of a graph, let’s add a few more things. John has a credit card, a bank account and an unsecured loan.

 
CREATE (cc1:Card {number:"4012888888881881", balance: 493.23})
CREATE (ba1:Account {number:"85474584", balance:1322.30, type:"Checking"})
CREATE (us1:Loan {number:"63493639", balance:5000.00, rate: 5.8, type:"Unsecured Loan"})
CREATE (john)-[:HAS_ACCOUNT]->(cc1)
CREATE (john)-[:HAS_ACCOUNT]->(ba1)
CREATE (john)-[:HAS_ACCOUNT]->(us1)

Sheila has a bank account and a credit card, as well as a SSN tied to her account.

 
CREATE (ba2:Account {number:"25384738", balance:2983.99, type:"Checking"})
CREATE (cc2:Card {number:"5105105105105100", balance: 893.11})
CREATE (ssn2:Identification {number:"000-42-4329", type:"SSN"})
CREATE (sheila)-[:HAS_ACCOUNT]->(ba2)
CREATE (sheila)-[:HAS_ACCOUNT]->(cc2)
CREATE (sheila)-[:HAS_ID]->(ssn2)

Karen has a bank account, an unsecured loan and a phone number tied to her account:

 
CREATE (ba3:Account {number:"63493639", balance:3204.83, type:"Checking"})
CREATE (us2:Loan {number:"28372342", balance:5000.00, rate: 6.0, type:"Unsecured Loan"})
CREATE (phone2:Phone {number:"312-606-0842"})
CREATE (karen)-[:HAS_ACCOUNT]->(ba3)
CREATE (karen)-[:HAS_ACCOUNT]->(us2)
CREATE (karen)-[:HAS_PHONE]->(phone2)

Our graph now looks like this, and so far so good. Nothing to worry about here.

Let’s add a few more things we know about our 3 customers. John and Sheila are sharing a phone number, ok could be a couple or maybe room-mates, or maybe the number used to belong to John and now it belongs to Sheila, so far things look ok:

 
CREATE (phone1:Phone {number:"312-876-5309"})
CREATE (john)-[:HAS_PHONE]->(phone1)
CREATE (sheila)-[:HAS_PHONE]->(phone1)

John and Karen are sharing an identification number, hum… how did that happen?

 
CREATE (ssn1:Identification {number:"000-91-7434", type:"SSN"})
CREATE (john)-[:HAS_ID]->(ssn1)
CREATE (karen)-[:HAS_ID]->(ssn1)

…and it turns out all 3 of these users happen to have the same address. That makes the shared telephone number make sense, but not the shared identification number.

 
CREATE (ad:Address {line1:"175 N. Harbor Drive", city:"Chicago", state:"IL", zip:"60601"})
CREATE (john)-[:HAS_ADDRESS]->(ad)
CREATE (karen)-[:HAS_ADDRESS]->(ad)
CREATE (sheila)-[:HAS_ADDRESS]->(ad)

We know these accounts are connected because we’re looking at a picture of it, but how can we tell Neo4j they are connected? In the world of Graph Theory what we are looking for is “Connected Components” which are subgraphs of nodes connected together.

To accomplish this task we can make use of the Neo4j Graph Algorithms Plugin, specifically the “Union Find” algorithm.

Make sure you have the Neo4j Graph Algorithms plugin installed. It’s easy if you use Neo4j Desktop but if not you can download it from the Neo4j Download Center (scroll down about half way). You will need to unzip and place the .jar file inside the neo4j “plugins” folder as well as configure the neo4j.conf file in the “conf” directory to allow the use of the graph algorithm stored procedures by adding this line and restarting Neo4j:

 
dbms.security.procedures.unrestricted=algo.*

Before we continue, we’ll add “Robert” as a user, with a checking account and credit card, but not directly connected to anything else in our graph.

 
CREATE (robert:User {name:"Robert"})
CREATE (ba4:Account {number:"8374927", balance:1273.39, type:"Checking"})
CREATE (cc3:Card {number:"378282246310005", balance: 134.72})
CREATE (robert)-[:HAS_ACCOUNT]->(ba4)
CREATE (robert)-[:HAS_ACCOUNT]->(cc3)

Alright, we should have 2 connected components in our graph. Let’s find them. We’ll use the streaming version of the unionFind algorithm and convert the nodeIds it returns to the actual nodes, so we can retrieve the name property alongside the “setId” assigned by the algorithm. The Neo4j Graph Algorithms have an amazing option of creating an in-memory graph using “Cypher Projection“. This option allows us to define the nodes and relationships of a virtual graph and run our algorithms on that instead of the actual graph. We are looking for connected users, but there are only indirect relationships between them, with Cypher projection we can turn their indirect relationships into virtual direct relationships like so:

 
CALL algo.unionFind.stream(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)-->()<--(p2:User)
   RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher'}
) YIELD nodeId, setId 
RETURN algo.asNode(nodeId).name AS user, setId

…and as expected, John, Sheila and Karen are in one set, while Robert is by himself on another.

This is pretty good, but what if the fraud rings made sure not to use any of the same identifying information for any of the accounts they opened? Let’s go back to what the graph looked like here:

Now what do we do? Well we can look at how they accessed these accounts. Let’s say the accounts for John and Sheila were accessed via the Interactive Voice Response (IVR) by the same ANI phone number.

 
MATCH (john:User {name:"John"}), (sheila:User {name:"Sheila"})
CREATE (ani:ANI {number:"312-666-1234"})
CREATE (ani)-[:CALLED]->(john)
CREATE (ani)-[:CALLED]->(sheila)

…and let’s say the accounts for John and Roberts were access by the same browser fingerprint.

 
MATCH (john:User {name:"John"}), (robert:User {name:"Robert"})
CREATE (fg:Browser {fingerprint:"asdf7373jsdf3rw"})
CREATE (fg)-[:ACCESSED]->(john)
CREATE (fg)-[:ACCESSED]->(robert)

Now our graph looks like (we left out Karens accounts for brevity):

Let’s run the union find algorithm again, but this time we will just use the “ACCESSED” relationships in our Cypher projection query.

 
CALL algo.unionFind.stream(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)<-[:ACCESSED]-()-[:ACCESSED]->(p2:User)
   RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher'}
) YIELD nodeId, setId 
RETURN algo.asNode(nodeId).name AS user, setId

Now John, Sheila and Robert are all in the same set, while Karen is by herself:

Now we can then combine both the browser/ivr access and user attributes together to see if they are all really part of one big fraud ring by looking for any indirect relationships in any direction between users in our Cypher projection:

 
CALL algo.unionFind.stream(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)--()--(p2:User)
   RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher'}
) YIELD nodeId, setId 
RETURN algo.asNode(nodeId).name AS user, setId

…and there it is. All four of our users are connected in some shape or form into the same set.

It’s pretty normal to find connected components of small sizes, for example a couple with separate accounts using the same computer at home, but large connected components should probably be investigated. If we have a large number of accounts, it may be better to use the non-streaming union find procedure and write the results into the node properties like this.

 
CALL algo.unionFind(
  'MATCH (p:User) RETURN id(p) as id',
  'MATCH (p1:User)--()--(p2:User)
   RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher'}
) YIELD setCount

That stored procedure call will add a “partition” property to our User nodes, then we can return the top partitions with this query:

 
MATCH (n:User)
RETURN n.partition, COUNT(*) AS members, COLLECT(n.name) AS names
ORDER BY members DESC
LIMIT 10

You would run the union find algorithm and this query regularly to find connected components, but when creating new accounts or getting new information for existing accounts (change of address, ivr logs, web lots, etc) we can simply check if any of our new nodes are connected to existing partitions or bridge multiple partitions indicating there may be a fraud ring in the making.

You can find the code for this post on this gist. Now you know how to find Fraud Rings using Neo4j, so go on and fight fraud!

“Oh wow!” and “Oh no!” tend to be the universal reactions of clients seeing the results of a @neo4j fraud ring query for the first time.

— Max De Marzi is building RageDB (@maxdemarzi) August 2, 2019

Max De Marzi

Graphs, Graphs, and nothing but the Graphs