Web Search: On to "Sense-Making"

IBM's Dan Gruhl and Andrew Tomkins explain how Big Blue's WebFountain technology tries to answer why questions

Google is suddenly on the tip of every investor's tongue. But things easily could have turned out otherwise. Eight years ago, two Stanford University computer-science students were trying to catch up with researchers at IBM (IBM ) in the race to create a new kind of Internet search engine. Instead of merely examining words in a document to see how closely they matched a query, the technology both teams were working on would factor in the Internet's rich topology of links. A document with 50 other pages linking to it would be considered more valuable than a similar document with only two incoming links.

The Stanford students solved the riddle first, spawning Google -- and, ultimately, a public offering that in a few months could value their company in excess of $20 billion.

IBM's search scientists may have faded from public view, but they never stopped thinking about how to help Net surfers zero in more accurately on Web-based information. After several years in stealth mode, last year IBM unveiled a self-described "sense-making" technology it calls WebFountain, which is designed to filter information in a highly re sophisticated way.


  Instead of scouring Web pages for keywords and links, as most search engines do, WebFountain aims to spot the opinions presented on the pages. Rather than merely asking for information about a Sony (SNE ) CyberShot digital camera, for example, a Web surfer could feasibly ask: "What do people think about the new Sony CyberShot digital camera?"

Just as intriguing, WebFountain is attempting to bring a time axis to Internet search. Today, search engines provide a snapshot of how the Web views a certain topic. But it's largely a medium without a memory. That makes it next to impossible to spot trends or easily analyze how things shift over time -- which could be compelling information. Imagine the value a marketer would get from an answer to the question: "How have mentions of my brand changed over the last six months?"

Despite WebFountain's potential, IBM has no designs to take on Google. Tucked away in the rolling hills surrounding IBM's Almaden research labs, about 10 miles south of San Jose, Calif., the WebFountain team talks about building a platform for business-intelligence services. IBM would provide the heavy-duty data-gathering and -parsing, around which other businesses can build vertical applications, such as a so-called reputation-management tool that Factiva sells to marketers -- Factiva pays IBM to run the infrastructure and crunch the data.

BusinessWeek Correspondent Ben Elgin recently hunkered down with WebFountain's top gurus, chief architect Dan Gruhl and chief scientist Andrew Tomkins, to learn more about their efforts. Here are edited excerpts of their conversation:

Q: Why do you folks bristle at being labeled a search company?


What we do is so much more. We call it "sense making." Search presents a tourist's view of information. It's a great way to start. But it's not the way to keep you up to date, show you trends, or help you understand the world around you. [IBM's technology tries to answer the question:] What does the landscape look like? And, more importantly, how is that landscape changing?

Q: Give us some examples.


For good or bad, some of the stuff we've kicked up has been surprising. There was a company that had invested nearly $1 billion in safety in a particular area. We did a search, and nobody knew about it. Their communications department had not gotten the message out to say, "Hey, we know there are problems, we invested money, and they're fixed."

There's a flip side. We had another company, where we ran a WebFountain analysis on one of their products, a shampoo. We found the No. 1 comment on the Web by far about their product was: It's the best way to remove grease from your garage floor. That's an alternative use for the product.

Q: What's the overarching goal? Is it different from Google's long-term goal of offering a search product that can answer any question, no matter how broad?


If you back up enough, everyone is interested in unlocking information. But if you get one step closer, suddenly there's a distinction. If you're a player in the search game, your model is that you have very sophisticated internal algorithms to understand a very short query. What you're going to return in response to the very short query is a document or a list of documents. You might deliver a little more, like a cluster of documents. But the unit you're operating on is the document.

We're on a different path. For instance, a lot of questions can be answered with the data. But you can't answer them with [conventional] query language. We'd like to have our query language be more rich, more graphical. We'd expect our users to be able to invest more time to learn the technology, really get familiar with it, unlock the potential of it. The things that we return would be not so much a document, but a visualization or a statistical aggregation. Or something that pops out of the data, such as a trend or a pattern.

We see ourselves as being in transition from asking "what" kinds of questions, like "what documents mention cat and dog," to "why" questions, like "why are there so many new documents that mention this?"

Q: Do you have a long-term overlap with Google's efforts to organize the world's information?


If storage got 10,000 times cheaper and processing 10,000 times cheaper, than Google and we could be looking at the same things. But because of the technology decisions you have to make very, very early, Google has gone to something where they refresh [their pages] every two months. They replicate it many, many times to respond to thousands of queries a second.

We crawl continuously. We look to be updating our store [of information] within 20 minutes of when a page [on some other Web site] changes. We'd be happy to answer two or three questions per minute that are very complex and change how business works. That's a very different target market. It's kind of like saying: Yeah, strictly speaking a Rolls Royce Phantom and Yugo are both cars. But the fact is, they're serving very different markets.

Q: Can this ever be delivered to a consumer audience?


We would have no objections to someone figuring that out. We're best at providing base capabilities that people can build on.

There's a challenge in what the interface would be so that everyone could use it. But that said, people have already become very savvy about how the Web works. The over-60 set is one of the fastest growing users of the Internet, and that's not going to stop. They're the people who have information needs. They have a need for technical information that far outstrips the needs of the 20-something crowd.

Before it's here, it's on the Bloomberg Terminal.