Hadoop Wins Converts Outside Silicon Valley

Open-source Hadoop has spread far from its Silicon Valley roots

Two years ago, when the Detroit Crime Commission began collecting and analyzing the social media posts of suspected criminals, it found Excel wasn’t up to the task. The 11-person agency began using Hadoop, a software suite developed in the early 2000s to help Web giants such as Google and Yahoo! store and analyze mountains of data. “Several million pieces of content is a lot and cannot easily be analyzed within any Excel spreadsheet, so we needed to do something a lot more robust,” says Lyle Dungy, director for intelligence at the Detroit Crime Commission. The software has already helped reveal a relationship between two suspected criminal organizations. “There’s so much digital evidence that’s out there. Most agencies are not exploiting it,” says Dungy.

A lot of technology never makes it out of Silicon Valley, let alone worms its way into a small city agency. Yet in the decade since Hadoop was developed, a cottage industry has emerged around the open-source software. The agricultural giant Monsanto relies on Hadoop to analyze and predict weather patterns, while the Indian government uses it to store information on more than 500 million citizens for its national identity registry. India’s biometric database, said to be the world’s largest, is so robust it can handle as many as 4 million logins per minute.

Dozens of companies peddle some variant of Hadoop—some give it away but charge for consulting and support services. The global Hadoop market was valued at $1.5 billion in 2012 and is expected to expand to $50.2 billion by 2020, according to a March report from Allied Market Research. Its leaders—Cloudera, Hortonworks, and MapR—each have attracted hundreds of millions of dollars in venture capital investments.

Hadoop’s creator, Doug Cutting, holds the title of chief architect at Cloudera. The programmer was working at Yahoo when he began writing the first scraps of code for the software and named it after his child’s toy elephant. “The trick for me is not letting it go to my head,” he jokes.

Because Hadoop is open-source, businesses don’t need to worry about being beholden to a single software vendor. “No one will ever have more than 15 or 20 percent of the committers, so you can’t dominate the community,” says MapR Chief Executive Officer John Schroeder, using the industry term for programmers who have permission to alter a program’s source code.

Hadoop lets companies handle larger data sets than traditional enterprise systems do, says Steffin Harris, who leads the North America big-data practice at the consulting firm Capgemini. The software is also cheaper than alternatives offered by giants such as Oracle and SAP. John Williams, senior vice president for platform operations at TrueCar, says the online car-buying service has saved “a huge amount of money” since it swapped its data analysis software from a big enterprise vendor in 2013 for Hortonworks’ version of Hadoop. TrueCar’s upfront cost for the software and the equipment to run it fell to 23¢ per gigabyte of data from $19 per gigabyte thanks to the switch, saving almost $20 million for the company, according to Williams.

“Every organization, whether an IBM, a Teradata, an Oracle, an SAP, now has Hadoop in their architecture,” says Tom Reilly, CEO of Cloudera, in which Intel owns an 18 percent stake. Hewlett-Packard has invested $50 million in Hortonworks, and HP’s chief technology officer, Martin Fink, sits on the company’s board.

Although they are rivals, all Hadoop providers share a common purpose, says Hortonworks CEO Rob Bearden: making the technology “mind-numbingly simple and reliable.” Early adopters of Hadoop were “very unhappy with the actual implementation and actual robustness,” says Chris Poulin, the lead partner of Patterns & Predictions, which has been using one form or another of the software since 2007. Based in Portsmouth, N.H., the company is using Cloudera’s version on a project for the Department of Defense agency, Darpa, to identify military personnel at risk of committing suicide. Says Poulin: “We’re only now just getting to the point where the infrastructure is stable enough and manageable enough.”

Cutting says the software has evolved to where he can now devote most of his time at Cloudera to other projects. These days, he says, “I just fix bugs and add features.”

Before it's here, it's on the Bloomberg Terminal.