Yahoo Spinoff Tries to Ride Data Tsunami

Scientists at Yahoo! who pioneered ways of dealing with huge amounts of information have started a new company that they hope will ride the coming sea change in how companies store and manage data.

Hortonworks, which counts Benchmark Capital and Yahoo as investors, was spun off by the Sunnyvale (Calif.) Internet company in late June. It’s one of several startups offering products and/or services based on Hadoop, an increasingly popular free software for handling the vast amounts of so-called unstructured data.

This type of information, which includes the ever-growing volume of Facebook updates, Twitter posts, office documents, and e-mails, can be expensive to put into conventional databases. That has generated interest in low-cost alternatives.

“Big data commoditization is coming and it’s clear that people are hoping to get more for less,” says Eric Baldeschwieler, chief executive officer of Hortonworks. At Yahoo, Baldeschwieler served in several different roles including vice-president and chief architect of Web search.

Hadoop Growth

Within five years, Baldeschwieler expects that more than half the world’s data will be managed by Hadoop. About 80 percent of information that companies currently possess is unstructured data.

Aside from Hortonworks, other startups such as Cloudera, Hadapt, and MapR Technologies are selling Hadoop-related products and services that will help companies better use the software on their networks. On Aug. 30, MapR Technologies received $20 million in Series B funding led by Redpoint Ventures, along with Lightspeed Venture Partners and New Enterprise Associates.

At Yahoo, Hadoop helped lower the cost of what it took to store and process data. As a result, Yahoo began keeping more information, says Baldeschwieler. It’s similar to when the price of computer storage dropped and people started to keep their e-mails instead of deleting them, which allowed users to search through years of messages. Keeping that information has helped Yahoo understand its customers better, which drives better ad revenue. “They’re seeing a lot of economic return, which means they want to keep more data,” he says, adding that he thinks more companies will do the same.

Hadoop was inspired by Google’s MapReduce software and was developed by Yahoo scientists as an open-source project. Hadoop can be freely downloaded and run on standard servers sold from a variety of vendors rather than pricey customized hardware and software from a single vendor.

Oracle’s Challenge

In contrast, Oracle sells its Exadata system to handle large quantities of structured information such as financial or other operational data. “It depends on the size of the cluster but a company may spend $6 million on an Oracle Exadata system and may spend a couple hundred thousand dollars on a Hadoop cluster to do similar work,” says Hadapt CEO Justin Borgman, who is working on a product that handles both structured and unstructured information.

“Hadoop plays in a much larger market than Exadata and is a materially cheaper way to process vast data sets,” says Peter Goldmacher, an analyst with Cowen & Co. “Oracle will never really compete for Hadoop data sets because it would destroy its traditional pricing model.”

Based on the Oracle earnings call in June, analysts including James Kobielus at Forrester Research now expect Oracle to make a Hadoop-related announcement at Oracle OpenWorld, which begins Oct. 2. Oracle declined to comment.

Cowen & Co. estimated in a July report that Oracle would charge about nine times more than a blended average of the newer big data vendors to solve the unstructured data challenge. Cowen predicts that the market for technologies to handle unstructured data “presents a growth opportunity that will be significantly larger” than the $25 billion relational database industry dominated by Oracle, IBM, and Microsoft. As of now, it’s unclear what percentage of that market Hadoop software will capture.

Market Shift

Hortonworks’ Baldeschwieler compares the changing market to the shift from mainframe computers to desktop computers and servers. As the cost of computing decreased, companies bought many more computers. Although each computer was less expensive, more companies could afford them so the market became much larger in time.

“Hadoop allows you to store data without knowing its relationship to other data,” says Mark Barrenechea, CEO of Silicon Graphics, which helped Yahoo optimize its hardware to get Hadoop running in the early days. “Once you know the relationships, you can go back in time and discover something.”

Barrenechea, a former Oracle employee, says he wouldn’t count his old employer out. Previous competitive technologies have become features in Oracle products, he says.

Even Baldeschwieler says there is still a growing need for Oracle relational databases. At Yahoo, he saw unstructured data grow quickly right along with the structured data that Oracle databases house. “All of the existing traditional data businesses can grow in a healthy manner forever and still the unstructured data is going to zoom by it,” he says because there’s simply more raw information in the world. “We don’t have to displace any of the existing data stores, we just have to capture that growth.”

Before it's here, it's on the Bloomberg Terminal.