Collecta Launches *Really* Real-Time Search Engine

Remember not so very long ago how amazed you were that Google could return so many useful search results in just a split-second? Today, that kind of speed isn’t enough for many people. The real-time communications inside social networks and microblogging services such as Facebook, FriendFeed, and most of all Twitter have introduced a new immediacy to online interaction and news. Even Google concedes it’s not yet providing an adequate search experience for such real-time streams of information.

Into the breach comes a new startup called Collecta, which claims it’s the first truly real-time search engine. Today, it’s launching a Web site that draws information streams from the many blogs using Wordpress, news services such as Fox, CNN, and Reuters, social aggregation sites like Mixx, Yahoo’s Flickr photo sharing site, and of course Twitter. Eventually, the company hopes to comb most of the Web. “There’s a lot more velocity in the creation of news and content today,” says Collecta CEO Gerry Campbell, who views real-time search as especially useful for watching streams of comments on live events like sports or breaking news or for keeping track of current comments on products or brands. “The world needs a real-time search engine.” Indeed, some pundits such as John Battelle believe real-time search is the next big thing.

It's tough just yet to tell if Collecta will deliver on its promise. Twitter already has a search function for tweets on its service it acquired with the purchase of Summize. But it's often not very real-time. And it's not the only source of real-time information, which resides not only on other services such as Facebook but also various blogs and news services as well. So there's a need for a search service that cuts across the many sources of real-time information.

Some are already taking a stab at that. A raft of real-time search services mainly focused on Twitter have launched recently, and yet another, Crowdeye, launches today as well. Facebook just announced it's testing a new search service with a small number of its members, though for now those searches are focused on Facebook activities. There's also new speculation that Google, which has expressed interest in a search deal with Twitter, might be preparing a microblogging search service. Not least, there's FriendFeed, which aggregates feeds from many services, updating comments on them as they're posted (sometimes so fast it's overwhelming), and it has a useful search box.

Collecta relies upon a Web standard called XMPP, for Extensible Messaging and Presence Protocol. Used in instant messaging, Web voice, and other time-sensitive services, it essentially allows for data to move from one person to another very fast. Whereas the HTTP Web standard (more properly, protocol) concerns how we view and move around pages on the Web, XMPP is about real-time communication. Collecta's team includes Chief Technology Officer Jack Moffitt, who's on the XMPP Standards Board. Campbell, former president of search and content technologies for Reuters and former senior VP of search for AOL, was an investor in and adviser to Summize. (Update: Nice explanation of the value and challenge of XMPP here at The Register.)

When I tried Collecta recently, the results were less than impressive, though that could have been a result of engineers continuing to work on the service and slowing things down. When you type in a search, the service says it's "getting real-time results," and lists them as they come in (within a quarter-second for Wordpress blogs, Campbell says). There's a preview pane so you don't have to click away to see more of the Web page or post. And importantly, the service has brakes: You can select search options to include or exclude stories, comments, photos, videos, and the like, and there's a pause button to stop the real-time updating if you're getting overloaded.

It looks promising, but rough for now: After a couple minutes of a search on "iran riots," I had one result (not especially useful), and after nine minutes, I had a grand total of six results. I don't doubt this will speed up considerably, and even in this demo, they were being updated in seemingly real time. But as with any new search engine, good performance is mandatory.

While I find real-time search results quite useful at times (using not only Twitter but this Firefox browser add-on that puts Twitter search results atop my Google results), I still wonder how many people are ready for this kind of firehose of information. Indeed, Campbell admits that the biggest challenge for Collecta is getting information from Web publishers fast enough.

I also think search results ordered strictly by time aren't going to be sufficient for most of us. Collecta will need to offer other ways to filter this firehose if it's to appeal to more than people with ADHD. Collecta plans to offer an application programming interface (API) to allow other software developers to create other services on top of it, so they may come up with such filters. These won't be trivial to create, though.

I'm not yet sure either whether there's as much commercial opportunity specifically in real-time search as some folks assume (though I think there's plenty of opportunity when that's combined with social search). Campbell says that while many real-time queries are not commercial, the opportunity to make money is much the same as in conventional search: ads related to the search terms. The uncertainty is whether real-time search terms will indicate buying intent as well as conventional search.

Before it's here, it's on the Bloomberg Terminal. LEARN MORE