Your Data, Naked On The Net
TECH & YOU PODCAST
The U.S. Justice Dept.'s demand for data on how Web surfers use Google (GOOG ) and other search engines raises a disturbing question: Just how much do the Web sites you visit know about you? In general, they know a great deal about the aggregate behavior of visitors, and nothing about individuals unless they have chosen to identify themselves. But there are exceptions.
Operators of even the most modest Web sites can learn a lot about visitors, short of pinpointing their actual identities. I manage a site for a small nonprofit. The hosting service, Homestead Technologies, throws in analytical tools from Media Highway International's RealTracker. I can tell the order in which visitors viewed pages, what Web sites they came from, and what search terms they used, among many other things. This information is invaluable for designing effective Web sites.
We don't ask visitors to register, and the only identifying information recorded in the data is a 12-digit Internet address. This normally only links the visitor to a large organization, such as their Internet service provider, employer, or school, and provides no clue to individual identity.
The situation is somewhat different on sites where you have registered. These can link your activity to whatever identifying information you have supplied, anything from a made-up user name and possibly fake e-mail address to your real name, address, and credit card information if you have divulged them. Once you give out that data, your life can be an open book. You can block the collection of personal information by setting your browser to reject files called "cookies," but this will cause many Web sites to work badly or not at all.
THE EXTENT TO WHICH WEB SITES use the data they collect is limited by their privacy policies, which the Federal Trade Commission can force sites to honor. If you live in the European Union, the EU Privacy Directive gives you much stronger legal protections than you get as a U.S. resident.
Privacy policies vary greatly. Google promises not to share any personally identifiable data with third parties without explicit consent. But BusinessWeek Online, like many commercial sites, reserves the right to share information (other than credit card data) with "selected outside companies whose products or services we feel may be of interest to you" unless an individual explicitly opts out.
There are, unfortunately, two factors that could put your privacy at much greater risk than you'd anticipate. One is advanced technology, the other a growing government appetite for information. Progress in mathematics and computer science is making it possible to assemble tiny, disparate bits of information into a comprehensive picture of an individual. For example, studies have shown that 87% of the U.S. population can be uniquely identified via only a date of birth, sex, and five-digit residential ZIP code. Someday you may be identifiable just from your tastes in books, movies, and sports as revealed by your Web browsing.
Government inquisitiveness is a much more immediate risk to privacy. The request that Google is fighting seeks only search terms, but the Justice Dept. could have asked for the Internet addresses that went with them. Then it could ask Internet service providers and other network operators to identify the people those addresses were assigned to, pinpointing the source of the request. And it's not just the government: The music industry has used similar techniques to identify the users of illegal download services.
There's not a whole lot you can do to prevent this data from being collected. You can use an anonymous proxy service, such as Anonymizer, but it can interfere with your use of the Web and can't guarantee to hide your identity in all circumstances. Or you can live with the fact that what you do on the Web cannot be regarded as truly private.
For past columns and online-only reviews, go to Tech Maven at www.businessweek.com/technology/wildstrom.htm