A very large Internet company once had the noble impulse to share some of its data with the research community. It made three months of log files from its search service available to all. The company took many steps to preserve privacy, removing personal information and randomizing ID numbers in the belief that this would make it impossible to identify any of the more than 650,000 customers who’d used the service. But Internet hobbyists, professional researchers, and journalists were able to ferret out many of the users. No. 4417749, for example, was a Georgia widow. Another user appeared to be planning a murder. Today, the AOL Search Log Scandal is remembered as one of the weirdest missteps in Internet history.
That took place an epoch ago, way back in 2006. Now anyone with a few dollars and a knack for computers can rent some cloud capacity and set up a stack of totally free technologies to deal with enormous amounts of data. Managing this data is a key part of functioning as a large Internet company. If you’re the intelligence apparatus of a global superpower, and your job is to keep an eye on people who are contemplating terrible acts, this data is incredibly valuable. You’re going to do what you can to get your hands on it. Once you do, you can employ beautiful, supple pieces of software—some with point-and-click interfaces and little icons—to help you understand what you’re seeing. It’s powerful stuff.