People You May Know (PYMK) is LinkedIn’s link prediction system and one of the site’s most recognizable features. As the name implies, it tries to find other professionals you might know, allowing members to growth their networks. PYMK is now responsible for more than half the connections on the site and is a principal component of engagement. You can see it on the site here.
At its heart, PYMK is a machine learning problem: binary classification on whether a member will connect with another. For example, one of the first things to look at is friends-of-friends, or triangle closing. Here, if Alice knows Bob and Bob knows Carol, then maybe Alice knows Carol. One can then score these closed triangles with such features as whether the pair overlapped at an organization or school, their age difference (you’re more likely to know someone near your own age), their geographical distance, etc.
There are several interesting and unstudied modelling challenges here. For example, in looking at organizational overlap, one must take into account the size of the organization, the length of overlap, geographic clustering, and the “propensity” of individuals in that organization to connect (some organizations are inherently more social than others).
Importantly, the scale of the problem—matching 185 million members to all other 185 million members—further complicates things (obviously, the matrix is sparse, but nonetheless). Many classical techniques break down at this scale.
As part of this work, we’re:
Incidentally, if you didn’t know, PYMK in the context of online social networks was invented at LinkedIn. It first showed up on the site in 2006.