Seek and Ye Shall Be Found

Search data stored by the likes of Google and AOL is a privacy timebomb. It's time for these Net giants to hit the delete key

During a recent panel discussion, Jennifer Mardosz, Qwest's (Q) chief privacy officer and corporate counsel, told the audience she was skeptical of congressional mandates laying out requirements for data retention. She argued that there was no need for legislative interference because "companies were already doing the right thing."

Google (GOOG) CEO Eric Schmidt also addressed the privacy issue at another conference this month, noting that he was more afraid of government (U.S. or other) trying to get access to Google's data than an accidental release of confidential customer information. When asked why Google doesn't purge their search information, Schmidt replied that they didn't need to because security protections would make it difficult, if not impossible, to steal customer data.

Several other major companies have said something similar whenever the subject of confidential data comes up. The "right thing" that most of them are doing to protect our privacy is to trust their own security while retaining their options—and, incidentally, our personal information—as long as they can.


One lesson that the Information Age has taught us is that no computer system is impervious to hacking if the value of the material or the need of the outsider is great enough. No policy can withstand a determined bureaucrat armed with subpoenas or empowered by an Act of Congress. And certainly no organization is accident-proof.

Most companies don't routinely and purposefully delete their data. It costs more to purge than to store, so businesses take the path of least resistance. Historically, this has caused orphaned account information to linger far too long at consumer companies.

Information saved by search firms is a greater threat to privacy than out-of-date account data maintained by telecommunication companies like Qwest, because analyzing a user's queries over time can provide remarkable insight into the person's thoughts, habits, and lifestyle. Misuse of search histories is a threat to privacy that has been getting significant media attention in the last year. The threat is often downplayed because most users don't believe that anyone could or would reconstruct their search history—and even if someone did, many people suspect nothing personal would be revealed.


We got a chance to find out just how wrong that thinking is a couple of weeks ago, when an AOL employee did a peculiar thing—he published three months of AOL Web searches detailing the interests of more than 650,000 AOL users. The data was supposedly sanitized for privacy by removing the account information.

AOL issued a "My bad" press release right afterward, and three people subsequently resigned, including the chief technology officer and the overly generous researcher himself, but the damage was done (see, 8/23/06, "Fallout from AOL's Flub"). The information was out there for a good part of the day and downloaded by several people, some of whom have since set up sites where the public can search the searches themselves.

The AOL users' true names were replaced with arbitrary numbers, but if anyone has any lingering doubts about whether personally identifiable information can be deduced from looking at this kind of abbreviated search information, I encourage them to find a copy on the Web and convince themselves otherwise. (Note: It seemed unethical to put a link to the data here, so astute readers will have to find it themselves.)


Reading these search logs isn't like reading a bunch of disjointed and random words, as search companies would have you believe. Instead, they read like stories, or tales about individuals. It's as personal as poking through a neighbor's garbage can. You feel like you know something about the searcher because what they ask about often provides insight into their lifestyles and quirks.

For example, dozens of people looked for information on suicide, including finding how-to guides. Several people wanted to know how a pregnancy is affected by all kinds of things including Adderal, Darvocet, and tanning beds. One person searched for pictures of Britney Spears naked and later looked for board of education Web sites in Michigan. Several people were even completely "outed" because at one point they had searched on their real name, address, or other personally identifiable information.

This information appears to be exactly what the Justice Dept. wanted from Google several months ago. Google refused to hand over the data, went to court, and sort of won, in the sense that they only had to give the government some diluted information. The AOL experience makes it clear that removing user identification from search histories doesn't guarantee privacy. This kind of data is probably just what the government wants—and it's what they'll get if they're successful with future subpoenas.


The Justice Dept. has requested that companies retain data to facilitate subpoenas, and there's at least one bill pending in the House that would require ISPs to do the same. The writing is on the wall—whatever is being saved by Google, AOL, and others may very well be accessed eventually by the feds. As long as search companies save this data, consumers have a privacy sword of Damocles hanging over their head.

The only way to remove this threat is for search companies to voluntarily delete the information from their search logs, foregoing whatever future revenue or marketing advantage they might be able to get from exploiting the data. If the companies persist in retaining this information, it will get out sooner or later. It will be used by the U.S. government and perhaps other governments, it will be required by civil action suits, or even stolen by hackers.

I call on the search companies to do the right thing: If you don't keep our information, no one can ever get it from you.

Before it's here, it's on the Bloomberg Terminal. LEARN MORE