People You May Know

People You May Know (PYMK) is LinkedIn’s link prediction system and one of the site’s most recognizable features. As the name implies, it tries to find other professionals you might know, allowing members to growth their networks. PYMK is now responsible for more than half the connections on the site and is a principal component of engagement. You can see it on the site here.

At its heart, PYMK is a link prediction problem: binary classification on whether a member will connect with another. For example, one of the first things to look at is friends-of-friends, or triangle listing (or triangle closing). Here, if Alice knows Bob and Bob knows Carol, then maybe Alice knows Carol. One can then score these closed triangles with such features as whether the pair overlapped at an organization or school, their age difference (you’re more likely to know someone near your own age), their geographical distance, etc.

There are several interesting and unstudied modelling challenges here. For example, in looking at organizational overlap, one must take into account the size of the organization, the length of overlap, geographic clustering, and the “propensity” of individuals in that organization to connect (some organizations are inherently more social than others), as modeled in our WWW’13 paper on the topic. That is, the intuition is simple: the affinity between two members working in a small company together for 10 years is greater than for members, say, who’ve worked at the company for only a few months.

Importantly, the scale of the problem—matching 225 million members to all other 225 million members—further complicates things (obviously, the matrix is sparse, but nonetheless). Many classical techniques in both modelling and infrastructure break down at this scale. As part of this work, we developed Voldemort’s read-only extensions (presented at FAST’12) for serving these results and PYMK was the basis for LinkedIn’s analytic stack (presented at SIGMOD’13), which is leveraged by most of the data applications on the site.

As part of this work, we’re:

  • Developing techniques and systems for large-scale machine learning;
  • Building infrastructure around processing graphs with billions of edges;
  • Evaluating the social graph (and other graphs) in higher-dimensional spaces;
  • Modelling the relationship strength between edges on the graph (known as connection strength).

Incidentally, if you didn’t know, PYMK was invented at LinkedIn. It first showed up on the site in 2006.