Cisco’s Network Bugs Are Front and Center in Bankruptcy Fight

“The entire network often has to go down in order to patch.”
Illustration: Simon Abranowicz

Game of War: Fire Age, your typical melange of swords and sorcery, has been one of the top-grossing mobile apps for three years, accounting for hundreds of millions of dollars in revenue. So publisher Machine Zone was furious when the game’s servers, run by hosting company Peak Web, went dark for 10 hours last October. Two days later, Machine Zone fired Peak Web, citing multiple outages, and later sued.

Then came the countersuit. Peak Web argued in court filings that Machine Zone was voiding its contract illegally, because the software bug that caused the game outages resided in faulty network switches made by Cisco Systems, and according to Peak Web’s contract with Machine Zone, it wasn’t liable. In December, Cisco publicly acknowledged the bug’s existence—too late to help Peak Web, which filed for bankruptcy protection in June, citing the loss of Machine Zone’s business as the reason. The Machine Zone-Peak Web trial is slated for March 2017.

“Machine Zone wasn’t acting in good faith,” says Steve Morrissey, a partner at law firm Susman Godfrey, which is representing Peak Web. “They were trying to get out of the contract.” Machine Zone has disputed that assertion in court documents, but it declined to comment for this story. Cisco also declined to comment on the case, saying only that it tries to publish confirmed problems quickly.

There’s buggy code in virtually every electronic system. But few companies ever talk about the cost of dealing with bugs, for fear of being associated with error-prone products. The trial, along with Peak Web’s bankruptcy filings, promises a rare look at just how much or how little control a company may have over its own operations, depending on the software that undergirds it. Think of the corporate computers around the world rendered useless by a faulty update from McAfee in 2010, or of investment company Knight Capital, which lost $458 million in 30 minutes in 2012—and had to be sold months later—after new software made erratic, automated stock market trades.

Peak Web, founded in 2001, had worked with companies including MySpace, JDate, EHarmony, and Uber. Under its $4 million-a-month contract with Machine Zone, which began on April 1, 2015, it had to keep Game of War running with fewer than 27 minutes of outages a year, court filings show.

According to Machine Zone, the hosting service couldn’t make it a month without an outage lasting almost an hour. Another in August of that year was traced to faulty cables and cooling fans, according to the publisher.

Cisco’s networking equipment became a problem in September, says a person familiar with Peak Web’s operations, who requested anonymity to discuss the litigation. The company’s Nexus 3000 switches began to fail after trying to improperly process a routine computer-to-computer command, and because Cisco keeps its code private, Peak Web couldn’t figure out why. The person familiar with the situation says Cisco denied Peak Web’s requests for an emergency software fix, and as more switches failed over the next month, the hosting service’s staffers couldn’t move quickly enough to keep critical systems online.

Finally, late in October, came the 10 hours of darkness. Three people familiar with Peak Web’s operations say the lengthy outage gave the company time to deduce that the troublesome command was reducing the switches’ available memory and causing them to crash. The company alerted Cisco. Machine Zone’s attorneys wrote that Peak Web has “aggressively sought to place the blame elsewhere for its failures” and that it could have prevented the downtime. In December, Cisco confirmed to Peak Web that it had replicated the bug and issued a fix, according to e-mails filed as evidence in the lawsuit.

Networking equipment such as switches and routers, which carry the world’s internet and corporate data traffic, tend to be especially difficult to fix with a software patch, says John Pescatore, director for emerging security trends at the SANS Institute, an IT training and research company. “The entire network often has to go down in order to patch—very disruptive in the best of times,” he says. In July, Southwest Airlines blamed a failed, aging router for an outage that canceled hundreds of flights.

Cisco, the leader in the $42 billion business for corporate networking equipment, has had similar problems before. In one previously unreported incident, in 2014, a glitch in a Cisco Invicta flash storage system corrupted data and disabled the emergency-room computer systems at Chicago’s Mount Sinai Hospital for more than eight hours, says a person familiar with the incident. Cisco later froze shipments of Invicta equipment and discontinued the product line. In another unreported case, a Cisco server in 2012 overheated inside a data center at chipmaking equipment manufacturer KLA-Tencor, forcing the facility to close and costing the company more than $50 million, according to a person familiar with the matter.

Spokespeople for Mount Sinai and KLA-Tencor didn’t respond to requests for comment. While declining to speak about specific incidents, Cisco spokesman Nigel Glennie says that after the company was notified of server failures at two customer sites in 2012, it posted public notices about the “thermal event” and replaced the faulty devices. Peak Web had its own thermal event: Chapter 11.

The bottom line: Buggy software, particularly in networking equipment, can leave companies with few ways to fix underlying problems.

    Before it's here, it's on the Bloomberg Terminal.