The Ghost in the Machine

The Reverend Thomas Bayes lived in the 18th century, but his statistical theory is shaping data-management software in the 21st

Thank the mathematical muses. A group of wacky academics has written a total of 29 songs in praise of Bayesian philosophy, an up-and-coming statistical theory that tries to incorporate the vagaries of the real world into the staid realm of stats. Take this ditty, set to the tune of Walking in a Winter Wonderland: "It's a bad situation / You get a fault of segmentation / A long sleepless night, your program's not right / Strugglin' with the Bayesian paradigm." So what does this have to do with Microsoft?

A lot, actually. Microsoft (MSFT ) is using Bayesian statistical principles to build sophisticated troubleshooting into its forthcoming XP operating system. Bayesian theory will also underpin the company's .NET strategy, which aims to make Microsoft software the foundation of not only the PC but most Internet transactions.

British software company Autonomy (AUTN ), based in Cambridge, used Bayesian theory to develop technology that sorts through vast amounts of "unstructured information," such as e-mail and company reports, and intelligently directs relevant info to the people who need it. Autonomy's clients include the BBC, General Motors, Proctor & Gamble, as well as the U.S. Defense Dept. And statisticians working in Atlanta are using Bayesian methods to prove that air pollution exacerbates childhood asthma.


  The man behind this movement was an eccentric 18th century British cleric, the Rev. Thomas Bayes. Experts believe that in developing his theories he was either trying to prove the existence of God or discover an elaborate way to cheat at cards. Today, those theories inspire statistical formulas that allow scientists to combine data with inference drawn from daily experience.

That's a heretical idea in the world of statistics. Traditional statisticians rely only on hard data, but Bayesians insist that real-world knowledge of how things work can be a powerful tool because it helps put seemingly unrelated information in the proper perspective. Economists might note, to take a simple example, that American turkey consumption tends to increase in November. A Bayesian would clarify this by observing that Thanksgiving occurs in this month. "Bayesian analysis pays big dividends in complex models where layers of data are interdependent and correlated in different ways," says Brad Carlin, a biostatistician at the University of Minnesota and the official keeper of the Bayesian songbook.

In recent years, Bayesian Theory has transitioned from academia to the world of commercial software. Microsoft, which houses a staunch band of Bayesian adherents, integrates the ideas into its operating systems. "In some ways, it's a dream," says Eric Horvitz, the head of Microsoft's Adaptive Systems & Interactive Research Group. Horvitz and his 20-person team of researchers are responsible for the sophisticated Help system embedded in Windows XP and for Mobile Manager, the company's savvy e-mail filtering product, which was launched in June. Mobile Manager evaluates incoming e-mail on a user's PC and, based on the user's past behavior, decides which are important enough to forward to a pager, mobile phone, or alternate e-mail address.


  Apparently it works. This reporter's e-mail to Horvitz requesting an interview rated 93 out of a perfect 100 on the urgency scale. Horvitz says that's most likely because it contained words such as "this week" and "interview." "It's like Sherlock Holmes looking through his magnifying glass and piecing together what it all means," says Horvitz. And bits of Bayesian functionality are embedded in many other Microsoft programs.

These programs are undoubtedly cool, but the real quantum leap for Bayesian software should come in Microsoft's ballyhooed .NET platform, which has yet to debut. It will include a series of Bayesian features, collectively called the Notification Platform, which tie together information about devices, such as cell phones and pagers, and data about interpersonal communications and usage patterns. The software then intelligently decides how best to deliver information at the right time and place.

For example, imagine you are trying to contact your brother who lives in California. Instead of sending an e-mail or calling him, you use a part of the Notification Platform called "BestCom." This program identifies who you are and then scans your brother's preferences, his calendar, and his past responses to your messages. Then, a message pops up on your screen advising you to either send an e-mail or phone him in three hours, when he'll be out of a meeting and back in his office. The system even tentatively adds your phone call to his schedule, so he'll know to expect you.


  Sound intrusive? Horvitz argues that it's less so than the Instant Messenger programs that are widely used today. Instead of interrupting your brother, you simply click on his icon and get instant info on his preferred way of being contacted.

Other Bayesians are targeting the corporate world. Autonomy used Bayes' methods to develop technology that can learn to identify patterns in information not collected in a database, such as e-mail and company memos. In doing this, it decides how information should be tagged, categorized, and directed.

For example, pharmaceutical company AstraZeneca uses Autonomy software so far-flung employees can automatically be kept abreast of research and initiatives specific to their work. "Many users said they had no idea that the information actually existed, or that they wouldn't have thought of looking in a particular source for that sort of information," says Duncan Fyfe, AstraZenaca's director of informatics. According to Autonomy CEO Mike Lynch, nondatabase information at most companies doubles every three months and makes up 80% of corporate communication. "The world is a complex place, and Bayesian inference helps computers to interpret it," he says.


  To be sure, Autonomy is still trying to figure out how to find stellar revenue and earnings in this amazing technology. And not all real-world applications of Bayesian theories have received rave reviews. Take the Microsoft Office Assistant, Clippie, the friendly-yet-bothersome cartoon paperclip that uses Bayesian inference to offer help writing letters or creating Excel spreadsheets. Many users soon tired of Clippie's intrusions into their work routines. And for .NET to work to the Bayesian maximum would require a possibly unacceptable level of monitoring in daily life. Microsoft's Notification Platform, for example, would put users under nearly constant surveillance.

The company realizes that privacy issues remain paramount. Still, Bayesian systems will likely play a big role as we increasingly become overwhelmed in an information society. These systems hold tremendous promise to help people with increasingly complicated schedules connect and sort through the mountains of data created daily. Not bad for a cleric who lived more than 200 years before the first PC was ever booted up.

By Jane Black in New York

Edited by Alex Salkever

Before it's here, it's on the Bloomberg Terminal.