Want to know how Apple's (AAPL) Genius song recommendation system for iTunes works? Apple engineer Erik Goldman offered up some insights to users of answer service Quora in a post back in May. While Goldman's post has since been deleted, Christopher Mims covered it in an MIT Technology Review story on Wednesday. Goldman's answer on Quora offered a sneak peak into the way big data analytics and aggregated personal information combine to personalize song recommendations and create custom content for iTunes customers. The Genius service boosts revenue for Apple, but insights into its workings could also benefit Web users as a whole.
Recommendation engines are the key to showing the entire Web on small devices, such as mobile phones, and to creating a hyperpersonalized surfing experience. For consumers, the Web has opened up billions of opportunities to find content, with much of it contained in the so-called long tail made famous by Wired's Chris Anderson. But mere mortals can't filter though all the possibilities to discover what the heck they want to read, watch, or listen to. Hence the popularity of recommendation engines and discovery services from such companies as Amazon.com (AMZN), Apple, Netflix (NFLX), and even Google (GOOG).
The heart of the Genius recommendation system is statistics applied to a large amount of data. The initial goal is to take an individual's playlist and measure the frequency of certain elements (such as the artist) and determine how significant that element might be in making a recommendation. To do that, the algorithms check the frequency of those elements in other Genius users' playlists to see which ones occur widely and which ones don't. This allows the system to compare playlists between people who like the same obscure bands rather than trying to draw conclusions based on the hundreds of millions of playlists that include Lady Gaga's Bad Romance.
Cutting the Data Down to Size
The second element of figuring this out relies on assessing which rules the recommendation engine can apply to your playlist to reduce the amount of data it must cycle through—the so-called latent factors. Christopher Mims writes:
Latent factors are what shake out when you do a particular kind of statistical analysis, called a factor analysis, on a set of data, looking for the hidden, unseen variables that cause the variation in all the different variables you're examining. Let's say that the variability in a dozen different variables turns out to be caused by just four or five "hidden" variables—those are your latent factors. They cause many other variables to move in more or less lock-step.
Discovering the hidden or "latent" factors in your data set is a handy way to reduce the size of the problem that you have to compute, and it works because humans are predictable: People who like Emo music are sad, and sad people also like the sound tracks to movie versions of vampire novels that are about yearning, etc. You might think of it as the mathematical expression of a stereotype—only it works.
Winning the Netflix Prize
These techniques aren't rocket science—they're statistics based. To learn more about how latent factors are uncovered, Goldman recommended folks turn to the site operated by the recent winners of the Netflix recommendation prize. For laypersons, I can recommend Wired's story covering the race to win the Netflix prize, which shows how most of the people trying to improve recommendation engines are doing so in the open and piggybacking on each others' efforts—something Apple doesn't seem to be endorsing, given that Goldman's post was deleted.
As the devices on which we consume our information become smaller, the need for better recommendations has moved beyond a nicety for discovering long-tail content into a necessity for displaying optimal results quickly over a mobile connection and on a small screen. I discussed this problem a while back with Elizabeth Churchill, principal research scientist and manager of the Internet Experiences Group at Yahoo (YHOO), and she emphasized that tailored recommendations are important for mobile users, not only because the screen sizes are small, but also because mobile connections are slower and people don't have the patience to wait for a lot of results to load.
The ability to use cloud computing, to access huge amounts of data, and then to crunch that data to make recommendations and deliver them in a format fit for mobile consumption, will be the key stepping-stones for the next generation of the Web.
Other stories from the GigaOM Network: