Data Analytics: Crunching the Future
The technicians at SecureAlert’s monitoring center in Salt Lake City sit in front of computer screens filled with multicolored dots. Each dot represents someone on parole or probation wearing one of the company’s location-reporting ankle cuffs. As the people move around a city, their dots move around the map. “It looks a bit like an animated gumball machine,” says Steven Florek, SecureAlert’s vice-president of offender insights and knowledge management. As long as the gumballs don’t go where they’re not supposed to, all is well.
The company works with law enforcement agencies around the U.S. to keep track of about 15,000 ex-cons, meaning it must collect and analyze billions of GPS signals transmitted by the cuffs each day. The more traditional part of the work consists of making sure that people under house arrest stay in their houses. But advances in the way information is collected and sorted mean SecureAlert isn’t just watching; the company says it can actually predict when a crime is about to go down. If that sounds like the “pre-cogs”—crime prognosticators—in the movie Minority Report, Florek thinks so, too. He calls SecureAlert’s newest capability “pre-crime” detection.
Using data from the ankle cuffs and other sources, SecureAlert identifies patterns of suspicious behavior. A person convicted of domestic violence, for example, might get out of jail and set up a law-abiding routine. Quite often, though, SecureAlert’s technology sees such people backslide and start visiting the restaurants or schools or other places their victims frequent. “We know they’re looking to force an encounter,” Florek says. If the person gets too close for comfort, he says, “an alarm goes off and a flashing siren appears on the screen.” The system doesn’t go quite as far as Minority Report, where the cops break down doors and blow away the perpetrators before they perpetrate. Rather, the system can call an offender through a two-way cellphone attached to the ankle cuff to ask what the person is doing, or set off a 95-decibel shriek as a warning to others. More typically, the company will notify probation officers or police about the suspicious activity and have them investigate. Presumably with weapons holstered. “It’s like a strategy game,” Florek says. (Before Bloomberg Businessweek went to press, Florek left the company for undisclosed reasons.)
It didn’t used to be that a company the size of SecureAlert, with about $16 million in annual revenue, could engage in such a real-world chess match. For decades, only Fortune 500-scale corporations and three-letter government agencies had the money and resources to pull off this kind of data crunching. Wal-Mart Stores is famous for using data analysis to adjust its inventory levels and prices. FedEx earned similar respect for tweaking its delivery routes, while airlines and telecommunications companies used this technology to pinpoint and take care of their best customers. But even at the most sophisticated corporations, data analytics was often a cumbersome, ad hoc affair. Companies would pile information in “data warehouses,” and if executives had a question about some demographic trend, they had to supplicate “data priests” to tease the answers out of their costly, fragile systems. “This resulted in a situation where the analytics were always done looking in the rearview mirror,” says Paul Maritz, chief executive officer of VMware. “You were reasoning over things to find out what happened six months ago.”
In the early 2000s a wave of startups made it possible to gather huge volumes of data and analyze it in record speed—à la SecureAlert. A retailer such as Macy’s that once pored over last season’s sales information could shift to looking instantly at how an e-mail coupon for women’s shoes played out in different regions. “We have a banking client that used to need four days to make a decision on whether or not to trade a mortgage-backed security,” says Charles W. Berger, CEO of ParAccel, a data analytics startup founded in 2005 that powers SecureAlert’s pre-crime operation. “They do that in seven minutes now.”
Now a second wave of startups is finding ways to use cheap but powerful servers to analyze new categories of data such as blog posts, videos, photos, tweets, DNA sequences, and medical images. “The old days were about asking, ‘What is the biggest, smallest, and average?’ ” says Michael Olson, CEO of startup Cloudera. “Today it’s, ‘What do you like? Who do you know?’ It’s answering these complex questions.”
The big bang in data analytics occurred in 2006 with the release of an open-source system called Hadoop. The technology was created by a software consultant named Doug Cutting, who had been examining a series of technical papers released by Google. The papers described how the company spread tremendous amounts of information across its data centers and probed that pool of data for answers to queries. Where traditional data warehouses crammed as much information as possible on a few expensive computers, Google chopped up databases into bite-size chunks and sprinkled them among tens of thousands of cheap computers. The result was a lower-cost and higher-capacity system that lots of people can use at the same time. Google uses the technology throughout its operations. Its systems study billions of search results, match them to the first letters of a query, take a guess at what people are looking for, and display suggestions as they type. You can see the bite-size nature of the technology in action on Google Maps as tiny tiles come together to form a full map.
Cutting created Hadoop to mimic Google’s technology so the rest of the world could have a way to sift through massive data sets quickly and cheaply. (Hadoop was the name of his son’s toy elephant.) The software first took off at Web companies such as Yahoo! and Facebook and then spread far and wide, with Walt Disney, the New York Times, Samsung, and hundreds of others starting their own projects. Cloudera, where Cutting, 48, now works, makes its own version of Hadoop and has sales partnerships with Hewlett-Packard and Dell.
Dozens of startups are trying to develop easier-to-use versions of Hadoop. For example, Datameer, in San Mateo, Calif., has built an Excel-like dashboard that allows regular business people, instead of data priests, to pose questions. “For 20 years you had limited amounts of computing and storage power and could only ask certain things,” says Datameer CEO Stefan Groschupf. “Now you just dump everything in there and ask whatever you want.” Top venture capital firms Kleiner Perkins Caufield & Byers and Redpoint Ventures have backed Datameer, while Accel Partners, Greylock Partners, and In-Q-Tel, the investment arm of the CIA, have helped finance Cloudera.
Past technology worked with data that fell neatly into rows and columns—purchase dates, prices, the location of a store. Amazon.com, for instance, would use traditional systems to track how many people bought a certain type of camera and for what price. Hadoop can handle data that don’t fit into spreadsheets. That ability, combined with Hadoop’s speedy divide-and-conquer approach to data, lets users get answers to questions they couldn’t even ask before. Retailers can dig into not just what people bought but why they bought it. Amazon can (and does) analyze its website logs to see what other items people look at before they buy that camera, how long they look at them, whether certain colors on a Web page generate more sales—and synthesize all that into real-time intelligence. Are they telling their friends about that camera? Is some new model poised to be the next big hit? “These insights don’t come super easily, but the information is there, and we do have the machine power now to process it and search for it,” says James Markarian, chief technology officer at data specialist Informatica.
Take the case of U.S. Xpress Enterprises, one of the largest private trucking companies. Through a device installed in the cabs of its 10,000-truck fleet, U.S. Xpress can track a driver’s location, how many times the driver has braked hard in the last few hours, if he sent a text message to the customer saying he would be late, and how long he rested. U.S. Xpress pays particular attention to the fuel economy of each driver, separating out the “guzzlers from the misers,” says Timothy Leonard, U.S. Xpress CTO. Truckers keep the engines running and the air conditioning on after they’ve pulled over for the night. “If you have a 10-hour break, we want your AC going for the first two hours at 70 degrees so you can go to sleep,” says Leonard. “After that, we want it back up to 78 or 79 degrees.” By adjusting the temperature, U.S. Xpress has lowered annual fuel consumption by 62 gallons per truck, which works out to a total of about $24 million per year. Less numerically, the company’s systems also analyze drivers’ tweets and blog posts. “We have a sentiment dashboard that monitors how they are feeling,” Leonard says. “If we see they hate something, we can respond with some new software or policies in a few hours.” The monitoring may come off as Big Brotherish, but U.S. Xpress sees it as key to keeping its drivers from quitting. (Driver turnover is a chronic issue in the trucking business.)
How are IBM and the other big players in the data warehousing business responding to all this? In the usual way: They’re buying startups. Last year, IBM bought Netezza for $1.7 billion. HP, EMC, and Teradata have also acquired data analytics companies in the past 24 months.
It’s not going too far to say that data analytics has even gotten hip. The San Francisco offices of startup Splunk have all the of-the-moment accoutrements you’d find at Twitter or Zynga. The engineers work in what amounts to a giant living room with pinball machines, foosball tables, and Hello Kitty-themed cubes. Weekday parties often break out—during a recent visit, it was Mexican fiesta. Employees were wearing sombreros and fake moustaches while a dude near the tequila bar played the bongos.
Splunk got its start as a type of nuts-and-bolts tool in data centers, giving administrators a way to search through data tied to the low-level operations of computers and software. The company indexes “machine events”—the second-by-second records produced by computing devices to keep track of their actions. This could include records of every time a server stores information, or it could be the length of a cell phone call and what type of handset was used. Splunk helps companies search through this morass, looking for events that caused problems or stood out as unusual. “We can see someone visit a shopping website from a certain computer, see that they got an error message while on the lady’s lingerie page, see how many times they tried to log in, where they went after, and what machine in some far-off data center caused the problem,” says Erik Swan, CTO and co-founder of Splunk. While it started as troubleshooting software for data centers, the company has morphed into an analysis tool that can be aimed at fine-tuning fraud detection systems at credit-card companies and measuring the success of online ad campaigns.
A few blocks away from Splunk’s office are the more sedate headquarters of IRhythm Technologies, a medical device startup. IRhythm makes a type of oversize, plastic band-aid called the Zio Patch that helps doctors detect cardiac problems before they become fatal. Patients affix the Zio Patch to their chests for two weeks to measure their heart activity. The patients then mail the devices back to IRhythm’s offices, where a technician feeds the information into Amazon’s cloud computing service. Patients typically wear rivals’ much chunkier devices for just a couple of days and remove them when they sleep or shower—which happen to be when heart abnormalities often manifest. The upside of the waterproof Zio Patch is the length of time that people wear it—but 14 days is a whole lot of data.
IRhythm’s Hadoop system chops the 14-day periods into chunks and analyzes them with algorithms. Unusual activity gets passed along to technicians who flag worrisome patterns to doctors. For quality control of the device itself, IRhythm uses Splunk. The system monitors the strength of the Zio Patch’s recording signals, whether hot weather affects its adhesiveness to the skin, or how long a patient actually wore the device. On the Zio Patch manufacturing floor, IRhythm discovered that operations at some workstations were taking longer than expected. It used Splunk to go back to the day when the problems cropped up and discovered a computer glitch that was hanging up the operation.
Mark Day, IRhythm’s vice-president of research and development, says he’s able to fine-tune his tiny startup’s operations the way a world-class manufacturer like Honda Motor or Dell could a couple years ago. Even if he could have afforded the old-line data warehouses, they were too inflexible to provide much help. “The problem with those systems was that you don’t know ahead of time what problems you will face,” Day says. “Now, we just adapt as things come up.”
At SecureAlert, Florek says that despite the much-improved tools, extracting useful meaning from data still requires effort—and in his line of work, sensitivity. If some ankle-cuff-wearing parolee wanders out-of-bounds, there’s a human in the process to make a judgment call. “We are constantly tuning our system to achieve a balance between crying wolf and catching serious situations,” he says. “Sometimes a guy just goes to a location because he got a new girlfriend.”