The Brutal Fight to Mine Your Data and Sell It to Your Boss
On May 23, an email landed in the sales inbox of a San Francisco startup called HiQ Labs, politely asking the company to go out of business. HiQ is a “people analytics” firm that creates software tools for corporate human resources departments. Its Skill Mapper graphically represents the credentials and abilities of a workforce; its Keeper service identifies when employees are at risk of leaving for another job. Both draw the overwhelming majority of their data from a single trove: the material that is posted—with varying degrees of timeliness, detail, accuracy, and self-awareness—by the 500 million people on the social networking site LinkedIn.
The email HiQ received was from LinkedIn Senior Litigation Counsel Abhishek Bajoria. “It has come to LinkedIn’s attention that hiQ Labs, Inc. has used and is using processes to improperly, and without authorization, access and copy data from LinkedIn’s website” in violation of LinkedIn’s user agreement, it read. Bajoria called on HiQ to cease and desist from visiting LinkedIn’s site and to destroy the data it had culled. The email set off a feud that led, a month later, to the two companies meeting in federal court, with HiQ suing LinkedIn and LinkedIn accusing HiQ of violating state and federal law.
A small number of the world’s most valuable companies collect, control, parse, and sell billions of dollars’ worth of personal information voluntarily surrendered by their users. Google, Facebook, Amazon.com, and Microsoft—which bought LinkedIn for $26.2 billion in 2016—have in turn spawned dependent economies consisting of advertising and marketing companies, designers, consultants, and app developers. Some operate on the tech giants’ platforms; some customize special digital tools; some help people attract more friends and likes and followers. Some, including HiQ, feed off the torrents of information that social networks produce, using software bots to scrape data from profiles. The services of the smaller companies can augment the offerings of the bigger ones, but the power dynamic is deeply asymmetrical, reminiscent of pilot fish picking food from between the teeth of sharks.
The terms of that relationship are set by technology, economics, and the vagaries of consumer choice, but also by the law. LinkedIn’s May 23 letter to HiQ wasn’t the first time the company had taken legal action to prevent the perceived hijacking of its data, and Facebook Inc. and Craigslist Inc., among others, have brought similar actions. But even more than its predecessors, this case, because of who’s involved and how it’s unfolded, has spoken to the thorniest issues surrounding speech and competition on the internet.
The courtroom clash in July drew some of the biggest names in the American litigation bar and split some of the web’s high-profile civil liberties watchdog organizations, with the Electronic Frontier Foundation coming out in support of HiQ and the Electronic Privacy Information Center writing an amicus brief for LinkedIn. Depending on whom you talk to, the sides are arguing about free speech or privacy, the scourge of data scraping or the danger of digital monopolies. The outcome will determine who gets to control the wealth of information about ourselves that, often unwittingly, we’ve put at the disposal of anyone with a professional curiosity and an internet connection.
“People analytics” is a new term, but the concept is as old as the office job. Psychology had barely been founded before its practitioners were identifying the traits of a good streetcar driver or telephone switchboard operator. In 1917 a group of prominent psychologists was asked to evaluate and sort the hundreds of thousands of young men being drafted into the U.S. Army to fight in Europe. The ensuing decades saw the American military and intelligence agencies become centers of research for psychological evaluation and aptitude testing; after World War II, many of their scientists were hired to head personnel research departments at AT&T, General Electric, General Motors, and other iconic corporations. There they subjected armies of salesmen, bankers, engineers, and middle managers to surveys and aptitude tests, simulations and role-playing games.
Still, in practice the selection and retention of talent remains more art than science—and often primitive art, at that. At most companies, HR isn’t where the most interesting thinking is happening. But that’s changing, as the data scientists who brought us Amazon recommendation engines, online ad auctions, and dating algorithms apply predictive analytics to how we think and act at work. The goal is to go beyond traditional but little-examined practices—for instance, the job interview (often useless) and the raise (not always the best way to retain talent)—to subtler metrics and methods. Big companies are growing more interested as the cost of replacing valued workers becomes clearer. Credit Suisse Group recently estimated that reducing attrition by 1 percentage point saves the bank from $75 million to $100 million a year.
That’s where HiQ comes in. The company was the brainchild of Darren Kaplan, a former ad man who saw that sites such as monster.com, glassdoor.com, and, above all, LinkedIn had upended the balance of power between employers and employees. LinkedIn’s founding in 2002 had given workers a new platform for marketing themselves and made it easier for recruiters (not to mention business journalists) to find and woo them. Its economic model is built, in large part, on charging for special recruiter memberships—they run about $9,000 a year, and companies often buy more than one—that help with finding and contacting potential poaches.
Kaplan envisioned a technological defense against the forays of recruiters, an early warning system that would assist companies in identifying restless workers so their bosses could entice them to stay. (“Predictive attrition insights” is the term HiQ has come to use.) In January 2015, the company rolled out Keeper, in which variables such as “independence from employer brand,” “mobility history,” and “external footprint” are displayed on a color-coded dashboard and combined to calculate a worker’s “flight risk.” Skill Mapper was released earlier this year, not long after the company hired Mark Weidick as chief executive officer.
An engineer and entrepreneur who’s worked at Cisco Systems Inc. and AT&T Inc., Weidick has the graying crew cut and earnest intensity of a high school debate team coach. “I can’t help keeping track of time,” he says apologetically, after our conversation strays from the presentation on the conference room screen beside us. He describes HiQ’s limited offerings as the first building blocks of a workforce analytics arsenal. Investors evaluating a prospective merger would pay good money, he argues, to know who’s likely to stay at a company and who might leave. “Imagine you had the wherewithal to see that autonomous vehicle experts were migrating out of Google into Apple before it became public,” he says.
Cisco and other software companies already perform this sort of analysis when they’re scouting offices, surveying the talent landscape in candidate cities to see which is the best fit. HiQ’s pitch is that it can offer clients something comparable through data science backed by off-the-shelf software and tailored consulting. Reviews have been good. “My experience working with them has been fantastic,” says an HR executive at a HiQ customer with more than 20,000 employees. (Like other clients, he was leery of being quoted by name while the case was going on.) “They are really smart about the data and about presenting it so that it’s used to improve business and employee outcomes.”
Weidick had been on the job for only three months when LinkedIn sent its cease-and-desist letter, and his first thought was that there had been a misunderstanding. Many of HiQ’s 26 employees came from LinkedIn, and one of the startup’s co-founders, Rob Desantis, had been an early LinkedIn investor and board member. LinkedIn staff regularly attended an annual people analytics conference HiQ hosted. Two weeks after receiving the letter, Weidick wrote a bewildered email to LinkedIn’s general counsel, likening himself to “a dolphin that got caught in the tuna net.”
Others at HiQ were less surprised. Even as LinkedIn employees mingled at HiQ’s conferences, its software engineers were implementing measures to block data scrapers, setting off a cat-and-mouse game with HiQ’s engineers. “LinkedIn had been aggressively complaining about what they considered unfair scraping practices for quite some time,” recalls Dan Miller, HiQ’s chief technology officer. “They went through a lot of trouble technically to make it difficult to collate that data. We obviously think they’re dead wrong to do that.”
In other words, as far as LinkedIn was concerned, HiQ was the tuna. When the larger company’s lawyers made that clear to Weidick, he hired the law firm Farella Braun & Martel. Deepak Gupta, a partner there, thought the case might interest his former professor, Laurence Tribe. Tribe, who teaches at Harvard Law School, is a constitutional law luminary and a liberal icon for arguing Supreme Court cases that expanded First Amendment protections and sought to overturn state anti-sodomy laws. He has also advocated for Peabody Energy Corp. in its fight against greenhouse gas regulation, however, and his expansive definition of free speech has led him to argue against net neutrality on behalf of Time Warner Inc. He estimates that he receives 20 or 30 appeals for legal help a day. “I usually just shrug them off and say, ‘I’m busy,’ ” he says. Among other things, he’s writing a book, teaching a new law school course, and suing President Trump for corruption.
Tribe has a weakness for certain internet law questions, though. In a seminal 1991 talk, he sought to delineate how the Constitution, written in the language of physical space and boundaries, should apply in the virtual reaches of cyberspace. What were the public squares and private rooms of the web? Who got to determine access? Should data be protected as speech? If so, how? Back then the internet was an exotic geek playground, but even though it’s now a global marketplace where trillions of dollars change hands, the courts have only begun to answer Tribe’s questions. When Gupta called, talking about a battle over control of social media data, Tribe says, “my constitutional nostrils flared.”
On July 27 he took a seat next to Gupta and another Farella partner in a federal courtroom in San Francisco for the second hearing in the case. HiQ had sued LinkedIn for unfair business practices and violating the smaller company’s right to free speech under the state constitution, which has broader speech protections than the U.S. Constitution. The startup wasn’t seeking damages, only asking Judge Edward Chen to issue a preliminary injunction that would force LinkedIn to let HiQ use its data.
Arguing LinkedIn’s case was Donald Verrilli, who served as solicitor general under President Obama, and others from the firm Munger, Tolles & Olson. The demand LinkedIn had made in its cease-and-desist letter to HiQ rested largely on the Computer Fraud and Abuse Act (CFAA), which makes it a federal crime, with a potential punishment of 10 years in prison, to access a computer without authorization. The 1984 law has been amended multiple times, but its earliest version was meant to protect against a WarGames-style hack of government mainframes. LinkedIn was contending that, although people who posted on the site owned their own data, that data was stored on LinkedIn servers, and HiQ was trespassing.
HiQ’s data scraping not only violated LinkedIn’s user agreement, it also threatened the “privacy interest of LinkedIn’s members,” Verrilli told Judge Chen, “and the integrity of LinkedIn’s trust relationship with its members, which is essential to its business.” Keeper, Verrilli added, let companies snoop on LinkedIn members in ways they hadn’t signed up for and wouldn’t want: “It’s an anonymous surveillance of their behavior, to rat them out to their employers.”
Gupta countered that Verrilli was mischaracterizing HiQ’s methods and products. LinkedIn members have the option to mark their profiles entirely private or entirely public, with gradations in between. HiQ made use only of the data LinkedIn members had indicated they wanted visible to everyone on the internet.
For LinkedIn, however, the key distinction wasn’t between public and private or visible and invisible, but between a person browsing a website and a bot brigade copying data at scale. LinkedIn’s lawyers offer an analogy in one of their briefs: The site is like “a massive job fair, held at a convention center and open to all comers.” Into this gathering HiQ sends a metaphorical swarm of interns wearing body cameras so it can track the movements of every attendee and sell the resulting information to their employers. “LinkedIn would be well within its rights,” Verrilli and his colleagues conclude, “to send a letter instructing HiQ and its camera-clad minions to stay out of its fair.”
Surveillance is a loaded word, one HiQ doesn’t like. But the pitch such companies make is that they can pick up faint signals in public data that would otherwise go unnoticed. Those signals can reveal things that weren’t actually intended to be public: the tastes and tendencies evinced by our web search patterns, the medical condition revealed by our buying history, our growing boredom with our job. Last year the American Civil Liberties Union discovered that an analytics company called Dataminr Inc. was allowing law enforcement and domestic intelligence analysts to use special keyword and location search tools to track people through Twitter, which partially owns the company. (Dataminr has since discontinued the practice.)
Data-scraping bots are only one of the technologies forcing us to rethink privacy protections. In 2012 the Supreme Court ruled in United States v. Jones that when the police affixed a GPS tracking device to a car, they were invading the privacy of the driver, even though it would have been perfectly legal to gather the same information by following the car around. Verrilli, the solicitor general at the time, lost that case, but speaking by phone in early October, he praises the decision. “What the court said, and it’s directly applicable here, is ‘No, no. This is a difference in kind, not merely in degree,’ ” he says. “It’s a level of intrusion and a level of surveillance that you could not as a practical matter ever accomplish absent the use of this super-high-powered technology.” The expense and labor of old-fashioned surveillance imposed practical limits on its use, but just as today’s cops—provided they can get a warrant—need not physically tail someone to know where he spends his time, people analytics professionals need not rely on a battery of tests and role-playing games to get inside employees’ heads. The easier it gets to harvest and analyze information, the more actively that information has to be protected. That, Verrilli argues, is what LinkedIn is trying to do.
If that argument is only somewhat reassuring, HiQ’s argument is effectively that we’re on our own, and that this is the price we pay for today’s internet. “There’s probably lots and lots of applications that might make someone feel a little queasy, right?” Gupta told Judge Chen. “But the thing is, we can’t sit here today and police every possible business model that some entrepreneur in Silicon Valley might come up with. It’s public information. It’s the marketplace of ideas. It’s the engine of our country’s growth.” The reason Google can put the entire internet at our fingertips is because, like HiQ, it scrapes public data. That includes LinkedIn pages, which is why they tend to be among the top results if you Google a noncelebrity (unlike HiQ, Google has LinkedIn’s explicit permission to collect data).
Still, even those who might prefer that someone act as the guardian of our data might not cast LinkedIn for the role. The company doesn’t have the size or sway of Facebook, but as a business networking site it has no real competition, and like Facebook its dominance is built on the wealth of data it controls. This summer, Bala Iyer, Mohan Subramaniam, and U. Srinivasa Rangan wrote an article in the Harvard Business Review arguing that the rise of Facebook, Google, Amazon, and the like necessitated a rethinking of the idea of a monopoly. “Tomorrow’s monopolies won’t be able to be measured just by how much they sell us,” the authors wrote. “They’ll be based on how much they know about us and how much better they can predict our behavior than competitors.”
The old considerations—Is pricing competitive? Does the consumer have alternatives?—haven’t gone away, but they’ve been augmented by new questions. Companies that control data capable of predicting their customers’ choices could, for example, figure out how to constrain those choices, making such dominance all the more durable. As HiQ’s lawyers were at pains to point out, LinkedIn itself is in the data-mining business and is thus a competitor to HiQ. The larger company offers a service called Update Me as part of its premium membership for recruiters. As its name suggests, the product alerts recruiters when particular people change their LinkedIn pages, mark a work anniversary, or do something else that might signal a recruitable moment. Keeper, by contrast, updates its risk profiles only monthly, not in real time.
Gupta made sure in court to carefully highlight the hypocrisy of a sentence from LinkedIn’s promotional pitch to recruiters: “And don’t worry—they don’t know you’re following them.” In his telling, LinkedIn wasn’t trying to prevent its members’ data from being mined and analyzed: it was just trying to muscle out a smaller competitor so it could have a new market to itself.
Like most legal disputes, the one between LinkedIn and HiQ is a battle of analogies, and near the end of the proceedings on July 27, Verrilli introduced one more. The information in your local library is public, but that doesn’t mean, he argued, “that you can break into the library with a crowbar at 2 in the morning because you’re seized with a desire to read Moby-Dick.” Libraries can impose reasonable limits on public information, and so, he said, can LinkedIn.
A few minutes later, Tribe rose to speak for the first time. When he was growing up, he recalled, library books had “a little tag inside that would tell you how often the book was taken out, and when.” Imagine, he went on, that someone wanted to collate that information to see what books were most popular. “For the government to make it a crime for me to make use of that information because they want to be the, perhaps, exclusive distributors of information about what’s popular to read would, of course, be unconstitutional.” That calculation didn’t change, Tribe argued, if it was a corporation rather than the government establishing the ban. “If LinkedIn has this power, so does Facebook, and the entire universe of cyberspace can be gobbled up by a small number of private owners,” he said. “That can’t be what the law of an open, democratic society with the First Amendment means.”
On Aug. 14, Judge Chen issued his ruling, and it was in favor of HiQ. He emphatically rejected LinkedIn’s interpretation of the CFAA, which would, he wrote, give a private company the power to choke off access to public data, with prohibitions “weaponized by the potential of criminal sanctions.” And although Chen was skeptical of some of Tribe’s broader First Amendment arguments, he thought HiQ had raised serious questions about antitrust violations. He arched a rhetorical eyebrow at how LinkedIn, purported guardian of its members’ secrets, enthusiastically marketed data-mining capabilities of its own “in a way that seems to afford little deference to the very privacy concerns it professes to be protecting in this case.” Chen also took the extraordinary step of enjoining LinkedIn from putting in place any measure, technological as well as legal, to prevent HiQ from accessing its site, and he ordered the company to remove any barriers already in place.
Speaking a few weeks after the ruling, Lawit, LinkedIn’s vice president for legal, still seemed blindsided: “When did it happen that companies who have data are forced to provide all of it with no conditions to anyone who wants it?”
At HiQ, the mood was predictably cheerier—“high-fives all around,” in Weidick’s description. Still, the costs of the litigation had been substantial. Precious startup capital had been spent on lawyers, and countless man-hours had been devoted to trying, unsuccessfully, to come up with a business model that wouldn’t depend on LinkedIn.
LinkedIn swiftly announced that it would take its case to the 9th U.S. Circuit Court of Appeals, which, in the past, has been favorable to companies invoking the CFAA against data scrapers. The soonest the case would be heard by the appeals court is in early 2018, and a Supreme Court decision, should the case make it that far, wouldn’t come for a few years. That’s an eon for a tech startup: By then, HiQ might actually be the analytics powerhouse its CEO, Weidick, envisions, or it might be defunct and remembered only as a cautionary tale.
The publicity around the case has led to more potential customers reaching out, Weidick says, but in recent months HiQ has also lost most of its employees, as more than a dozen data scientists, designers, and programmers, calculating the odds, have left for jobs at places not shadowed by an existential legal battle. That much attrition would be painful for any company, but it’s particularly galling for one in the attrition-insight business. “The irony is not lost on me,” Weidick says. “Not at all.”