Matches are the New Hotness

How do you help a person without a job find one online? A search screen. How do you help a person find love online? A search screen. How do you find which camera to buy online? A search screen. How do you help a sick person self diagnose online? I have no idea, I go to the doctor. Doesn’t matter, what I want to tell you is that there is another way.

Now, search is great. It usually helps people find what they’re looking for… but sometimes they have to dig through tons of stuff they don’t really want. Why? Because people can usually think of what they want, but not of what they don’t want to come back. So you end up with a tons of results that are not very relevant to your user…. and unless you are one of the major search engines, your search is not very smart.

Does your search know about your user? Does it know what your user has permissions to see?
How does it tie into personalized recommendations? Does it take advantage of the things we already know about a user? It’s this last question that’s most important, and I want to show you how you can use a graph database like Neo4j to help your users find what they want differently.

Pretend you are looking for a job. Think of the things you put on your resume. Your location, your education, your work history, your skills, your references, etc. Now pretend you ARE a job post. Yes, I know I am asking you to be an inanimate object. Just give it a shot. What do you want in a candidate? Someone near your location, with a certain level of education, and mix of skills, preferably vetted by someone you respect, etc.

Both the job candidate and job post are thinking about the same things, but if you look at a resume and a job description, you will realize they aren’t speaking the same language. Why not? It’s so obvious it has been driving me crazy for years and was one of the reasons I built Vouched and got into this Graph Database stuff in the first place. So let’s solve this problem with a graph.

I am going to make Candidates and Job Posts speak the same language by connecting them to location and skill nodes. I am keeping things simple, anything that has some relationship to both people and job posts could be used.

Mary lives in California, knows C# and knows PostgreSQL. There is a job post that requires a candidate be in California, know C# and know PostgreSQL. We connect these two together in our graph like so:

Now we can use our graph to get from Mary to this Job or any job where she would be a great candidate with the following Cypher query.

START me=node:users_index(name={user})
MATCH skills<-[:has]-me-[:lives_in]->city<-[:in_location]-job-[:requires]->requirements
WHERE me-[:has]->()<-[:requires]-job
WITH DISTINCT city.name AS city_name, 
     job.name AS job_name,
     LENGTH(me-[:has]->()<-[:requires]-job) AS matching_skills,
     LENGTH(job-[:requires]->()) AS job_requires,
     COLLECT(DISTINCT requirements.name) AS req_names, 
     COLLECT(DISTINCT skills.name) AS skill_names
RETURN city_name, job_name, 
FILTER(name IN req_names WHERE NOT name IN skill_names) AS missing
ORDER BY matching_skills / job_requires DESC, job_requires
LIMIT 10

That looks a little complicated, let’s break it down piece by piece.

START me=node:users_index(name={user})

We start with the things we know, and right now all we know is we have a user named Mary that we can look up in our users_index and use as our starting point in the traversal. This query will be used by other users, so we parametrize it with {user}.

MATCH skills<-[:has]-me-[:lives_in]->city<-[:in_location]-job-[:requires]->requirements

Then we look for patterns in the graph that look like this MATCH segment above. Cypher knows “me”, it doesn’t know anything else, but as it traverses relationships from “me” it assigns nodes and relationships along the way as we name them. This is a little different if you are used to SQL where all your tables already have a name. Here I am naming all nodes that have an outgoing relationship of type “has” from “me” as skills. Because that is what they are in my mind, I could have called those nodes anything else, it doesn’t matter to Neo4j. I do the same for other nodes along the pattern.

WHERE me-[:has]->()<-[:requires]-job

I only want to match jobs WHERE I share at least one skill that it requires. Notice we are using a pattern as our where clause. We don’t name the nodes in between because at this point we only care that one exists.

WITH DISTINCT city.name AS city_name, 
              job.name AS job_name,

We can chain queries together using the WITH clause which pipes the results from one query to the next. The DISTINCT clause returns only the unique records found. Here we start by passing the name property of both city and job.

LENGTH(me-[:has]->()<-[:requires]-job) AS matching_skills,

We continue to capture results that we want to pass to the next query. Here we are using the LENGTH clause not to find the length of the path between a user and jobs since we know that’s just 2 hops away, but instead using it to find the number of paths that exist between them.

LENGTH(job-[:requires]->()) AS job_requires,

We use the same trick again to get the number of skills required by each job.

COLLECT(DISTINCT requirements.name) AS req_names, 
COLLECT(DISTINCT skills.name) AS skill_names

Then we COLLECT the names of the requirements and skills into two arrays.

RETURN city_name, job_name, 
FILTER(name IN req_names WHERE NOT name IN skill_names) AS missing

We then RETURN the values we are interested in, and use the FILTER clause to return only the names of the skills of a job we do not know.

ORDER BY matching_skills / job_requires DESC, job_requires

We then ORDER BY the percentage of skills the user matches for the job in DESCending order, and use the number of required skills of the job as a tie breaker.

LIMIT 10

Lastly we LIMIT the number of results to the top 10.

We now have data we can display to a user.

A user would then be able to see what jobs they qualify for and what jobs they almost qualify for. Maybe they do know those skills and just forgot to mention it in their profile, or maybe it’s something they can learn quickly… or they can lie.

If you implement this in such a way that you get the user’s skills and location up front, you don’t even need to provide a search screen, and instead go right into results.

You can see it running live on Heroku and as always, the code is available on Github.

If you think you might have a use for this solution don’t hesitate to get in touch and I’ll be happy help.

10 thoughts on “Matches are the New Hotness

  1. pdursuau says:

    As always, a great post!

    I do have a quibble about solving the different language of resumes and job ads by substring matching. What if my resume lists “mapreduce,” and “yarn,” but the job ad says “mapreduce?” You and I can detect the match, but not what I am seeing here.

    And if we were to hard code that in as a match, how do we enable others to use our discovery that should be a match?

    BTW, the “not knowing what to exclude” is a very clever point!

    Hope you are having a great day!

    Patrick

    • maxdemarzi says:

      Well the assumption here is that you’ve gone beyond substring matching in both your resumes and job posts. The skills/cities should all be normalized named entities. Either by smart entity extraction, or restricting your users (on both ends) to use only pre-approved skills/cities. You would map “SQL Server”, “Microsoft SQL Server”, “MS SQL Server”, “SQL Server 2008”, etc into one normalized node and use that for your matching.

      • pdursuau says:

        That works but I thought we had left the burdens of normalization behind with SQL. ;-)

        More seriously, I have reservations about normalization. Works for example in library catalogs, but not always well and there are trade-offs. Such as loss of the original terminology, which could be important.

        And we are no longer limited to the space on a 3×5 card. We can be more liberal in terms of alternative identifications.

  2. Brilliant post Max. Graph-based matchmaking made simple to understand. I have been working on a recommendation for job-seekers and employers . This is exactly what we do with our Neo4j instance!

    Graphs help one think beyond keyword or string matching, indeed! :)

  3. great post, thanks for sharing!
    shouldn’t it be possible to link variants of skills to one normalized node, to keep original versions, but still enable finding them?
    I dont think you can’t live without grouping things into classes, that’s a logical necessity, isn’t it?

  4. […] Matches are the New Hotness […]

  5. As always another great post Max. Have you thought about that staffing app we spoke about? :)

  6. […] the “Matches are the new Hotness” blog post, I showed how to connect a person to a job via a location and skills. We’re […]

  7. […] akin to a recommendation engine in Neo4j and pretty trivial to write. Take a look back at a few old blog posts for […]

Leave a reply to Delivering a Graph Based Search solution to slightly wrong data | Max De Marzi Cancel reply