Battling Data Monsters at Yahoo!

Usama Fayyad is helping the Internet giant figure out how to harness the massive amount of data it collects, in ways that profit advertisers and protect Web users

Usama Fayyad's colleagues say he battles monsters for a living. In the elite engineering circles that this former NASA rocket scientist inhabits, the job description passes for a wisecrack. But, like many jokes, there's truth behind it.

Fayyad is Yahoo!'s (YHOO) chief data officer, possibly the first person to hold such a position. His role since he took the post in December, 2004, has been to make both sense and money from the vast amounts of information Yahoo collects on the doings of 500 million people who visit its site every month.

Each day, Yahoo collects between 12 and 15 terabytes of data. This vast store includes the search keywords people type, the Yahoo pages they visit, the ads they click, the videos they watch, and even whether they scroll all the way down to the bottom of an article. Yahoo's daily data collection exceeds the digital size of the entire Library of Congress. A single terabyte alone is so massive that computer scientists named it after teras, the Greek word for "monster"—hence the humor in Fayyad's job description. "We used to call them 'terrorbytes,'" says Fayyad.

The "Data Wars"

Lately, the tongue-in-cheek explanation of Fayyad's employ seems more apt that ever. Fayyad, along with a growing number of executives at other companies who also oversee reams of data collected on their Web sites, are engaged in a major battle over how freely that information can be used to tailor ads to individuals. The monster, as even Fayyad sees it, is the potential to misuse the data—violating consumer privacy in the name of personalization and profits. "Humanity as a whole hasn't figured out how to deal with this," says Fayyad.

The battle came to a head recently with Web users and government regulators becoming involved like never before. On Dec. 5, Facebook Chief Executive Mark Zuckerberg apologized to users and changed his company's new advertising policy after more than 75,000 people signed a petition objecting to how the popular social network shared their information (, 11/30/07).

Facebook's retreat came just a month after the Federal Trade Commission held hearings concerning "behavioral targeting," the practice of tracking a user's online travels in order to show, say, an ad for mortgages and home-equity loans to a person who recently visited a real estate Web site. And on Dec. 11, IAC/InterActive's (IACI) search business responded to the mounting worry by announcing a tool that enables Web surfers to erase their keywords from the company's database.

Jeff Chester, executive director of the Center for Digital Democracy, a nonprofit consumer advocacy group, says there's more public outcry to come. "The technology is an unstoppable force able to collect data and target users throughout the ubiquitous off- and online landscape," says Chester. "This is just the beginning of the data wars."

If anyone seems ready for such a fight, it's Fayyad. Built like an oak tree, Fayyad's towering 6-foot 5-inch frame would make nearly anyone think twice about trying to push him around physically. More important, with a PhD in engineering, two master's degrees in computer science and mathematics, all from the University of Michigan, he knows just how much—or how little—data is needed to reach a marketer's objectives. When he confidently promises that he can help an advertiser target Yahoo users looking to purchase a new red truck in Montana without revealing e-mail addresses or other personal information, it's hard to doubt him.

Still mistakes happen, often with dire consequences. In November, Yahoo CEO Jerry Yang apologized to a Senate committee and settled a lawsuit concerning Yahoo's role in disclosing the identity of a Chinese journalist (, 11/6/07) accused of revealing state secrets. Last year, Time Warner's (TWX) AOL was sued after posting Web search records on the Internet (, 8/23/06). The executives believed the records to be anonymous, but they were actually traceable back to individuals.

A few years ago, Fayyad would never have imagined being embroiled in a fight between privacy advocates and data-hungry advertisers. His passion was academic research. "If you had talked to me at any time [before the Yahoo job], I would have said, 'Never, never would I work in the advertising space and at a media company,'" said Fayyad.

Sifting Data for Sense

Gradually, however, Fayyad's goals changed. Web companies were collecting digital warehouses full of data about customers' online actions. Yet few understood how to take that information and transform it into something that could improve their businesses, says Fayyad.

A car company, for example, would have countless records of vehicle specifications people researched online. But it couldn't parse that data to maximize sales by, say, stocking more white cars at dealerships located near those people who frequently searched for that paint color online. "The biggest challenge is how to make sense of all the data and find something useful there," says Gregory Piatetsky-Shapiro, president of KDnuggets, a research and consulting firm specializing in data mining. "It is like standing under the waterfall and trying to drink."

Fayyad knows how to drink, so to speak, says Piatetsky-Shapiro, who co-authored a book with him in 1996. "Usama is a really excellent researcher and he has grown from a researcher to a very capable manager," says Piatetsky-Shapiro. "I was sure that he would go into business because, although he is a very good researcher, he had that business sense."

Fayyad used that acumen to start a data-mining company that eventually split into two firms: a consulting firm called DMX Group, and a targeted-ad firm called Revenue Science. Yahoo became a DMX client in 2004. Later that year, then-CEO Terry Semel came calling.

Semel had good reasons to want Fayyad. Google (GOOG) had just completed the most successful initial public offering of stock in recent memory on the strength of its robust success in placing ads tied to the information users were seeking with its search engine. Yahoo's data held similar promise.

Delivering ads based on a user's surfing behavior also held the potential to address what was becoming a problem for Yahoo: the fragmentation of audiences across an ever-increasing array of Web sites. Many of them, such as e-mail services and social networks, don't feature specific content obviously related to certain products a company might want to advertise. The notion of displaying ads on such Web pages based on the type of sites a user had been visiting before arriving there promised to increase ad revenues.

Responsible Use of Information

Money, however, wasn't Yahoo's only concern. Semel also needed someone who would keep certain information off-limits, lest Yahoo anger its users. "Semel said, 'Hey, you are here to make sure that we use our data responsibly,'" says Fayyad.

That directive has perhaps never been more important than now. Much of the data Yahoo, AOL, and others encounter comes in the form of cookies, text tags that are downloaded to a user's computer that indicate which sites that machine has been used to visit. But increasingly, the data collected online includes the interests and activities people promote on their social network pages, the location information they enter in mapping services, the reviews they write about restaurants, and the things they buy on the Web. "We are going to see companies testing the limits," says Dave Morgan, the recently appointed executive vice-president for global advertising strategy at AOL. "The protection of consumer privacy is going to be one of the most important issues in the future of digital media."

Standards for appropriate use of online data are slowly emerging. Users have shown, most recently with the Facebook protests, that they want to choose whether to "opt in," or agree to having their data collected when they could potentially be identified by that data, says Morgan.

In the future, Fayyad says, more explicit agreements could be required. One possibility is offering users rewards, such as coupons for online purchases, if they agree to be tracked. "It is still early stages," says Fayyad. "We need to be careful, but there is no need for panic…we will achieve an equilibrium."

Before it's here, it's on the Bloomberg Terminal.