The Web Database Company Google Just Bought

Google (GOOG) is in the process of acquiring ITA Software, an airfare information provider that brings the company into the realm of vertical travel search. It also creates potentially awkward competition with ITA customers such as Kayak and Bing (MSFT).

Travel isn't the only thing ITA does: A few years ago, a research division of the company started a build-your-own-database tool for Web data. Needlebase, as it's known, is a nifty way to give structure to disorganized and constantly changing information on any topic.

Needlebase, which started giving out free beta trials in January, uses machine learning to assemble scraped data from web sites and other sources into a hosted database intended to power vertical search engines. It's similar to Metaweb's Freebase—the other semantic Web/structured data acquisition Google made this month—but instead of a giant, Freebase-like public database, each user's Needlebase account is private, unless designated otherwise. Needlebase is designed for use by anyone who wants to organize and play with data regardless of technical knowledge, the way an avid soccer fan looks to parse and visualize game statistics. It's also built to be powerful and reliable enough for commercial vertical search engines to use as part of their back-end.

The Needlebase acquisition could be a key indicator of Google's vertical search strategy. It was out of character for the search engine to buy something as domain-specific as ITA, but Needlebase is not domain-specific at all. As I wrote in a previous post about Google's forays into vertical search:

Google could potentially use ITA as a way to get into many more verticals without additional acquisitions or major new products. Perhaps Google was interested in vertical search, but it may be even more interested in an easy way to take massive amounts of unstructured data and give them structure. It would be the equivalent of spinning straw into gold.

a generic version of ITA's QPX

The 14-member Needlebase staff is led by Justin Boyan, ITA's vice-president of Web data integration. He's a seasoned online anonymity researcher who worked at NASA's Ames Research Center and who has been with ITA nearly 10 years. Boyan said in a recent phone interview that he's "optimistic" that Needlebase will fit into Google's plans for Boston-based ITA, and that Needlebase will continue to serve existing and new beta users.

Boyan described the impetus for Needlebase, which is a more generic version of some of the technology behind ITA's main airfare product, QPX.

It doesn't require solving the AI problem to take out the columns of a table. And there's no reason to have to labor over maintaining Perl scripts. It just seemed like a real nice match to the kinds of machine learning that we were already familiar with.

Cloud-based Needlebase starts with a wizard tool for scraping data from websites, including Javascript-heavy and form-driven ones, as well as CSV, XML, and Excel files. The secret sauce is that the next time it refreshes data from that source, it will remember what it learned about the user's edits, cleanups, and duplicate deletion, and apply that learning to the new data automatically. All the while, Needlebase normalizes, geocodes, fixes capitalization, and makes other tweaks so that data can be merged and queried.

seeking: aggregators who can pay

Needlebase has been used so far to manage information about movies, jobs, hotels, events, weather, and oil spills, said Boyan. Check out sample projects for 2010 World Cup stats and heavy metal bands. Boyan said Needlebase is intended to be a commercial-grade tool. "We're looking for aggregators whose business is aggregation—people building vertical search engines, doing data gathering and analysis, and business analysts." He said he hopes to announce Needle's first two paying customers soon. Pricing will be cloud-style, pay-as-you-go, based on the amount of data each customer acquires, hosts, and publishes.

Needlebase has had virtually no publicity to date. With Google's name now behind it, the stakes have changed. Then again, the acquisition could also have the effect of scaring off potential customers that worry Google will compete with them in vertical search.

Also from the GigaOM Network:

Real-time May Be Nice for Search Engines, but What About Personal Lives?

Nokia Siemens Networks Wins $7B Contract to Build Harbinger's LTE Network

PopScreen Says It Knows Which Videos Will Go Viral

Facebook Ranks Below Airline Industry in Customer Satisfaction

Why Online Job Hunting Feels So Weird

Before it's here, it's on the Bloomberg Terminal. LEARN MORE