Inside the 20-Year Quest to Build Computers That Play Poker
Four of the best professional poker players in the world spent most of January holed up at the Rivers Casino in Pittsburgh, losing. They’d show up before 11 am, wearing sweatpants and stylish sneakers, and sit down in front of computer screens. Each of them was supposed to play 1,500 hands of heads-up no limit Texas Hold ‘Em online before they could go back to the hotel for the night. This often meant working past 10 p.m. Over the course of the day, Starbucks cups and water bottles piled up next to the players's keyboards. Chipotle bags lay at their feet.
Every time one of the players made a move, the action was transmitted to a computer server sitting five miles away at Carnegie Mellon University. From there, a signal would travel another 12 miles to their opponent, a piece of software called Libratus running at the Pittsburgh Supercomputing Center in Monroeville, a nearby suburb. Libratus played eight hands at once — two against each opponent. It moved at a deliberate pace, slow enough to drive Jason Les, one of its human opponents, a bit mad. “It makes the days longer,” said Les, an earnest, athletic-looking man who seemed eager to take a few minutes off one afternoon last week. “Waiting should not affect me whatsoever, but sometimes you’re just like, 'OK, is this going to be over yet?'”
Libratus, of course, never needs a break. It's different from human players in other ways, too. People tend to think longer when there’s more money at stake. The computer plays most slowly on small pots, a result of having to scroll through all the additional possibilities that come from having more chips remaining in its hand. Libratus also tends to make huge, sudden wagers, violating standard betting conventions by throwing its money into the pot in irregular amounts and at odd intervals.
Coming from a human player, behavior like this would be irritating, reckless and, over the long run, expensive. But Libratus’s main attribute as a poker player is that it’s inhumanly good. When the 20-day tournament at Rivers came to an end Monday, the humans had lost $1.8 million. (They didn’t actually have to pony up the cash; money serves as the way of keeping score in poker.) Tuomas Sandholm and Noam Brown, the computer scientists at Carnegie Mellon who built Libratus, celebrated the win as the first time that a computer has beaten top poker players at a variant of unlimited Texas hold’em, the world’s most prominent poker game.
Experts in artificial intelligence have always used games as a way to develop and test their creations. Computers have surpassed the best human players at chess, checkers, backgammon, and go. Poker is a distinct challenge because of the element of chance, and because the players don’t know what cards their opponents are holding. So-called imperfect information games require the sort of human intelligence — like deceiving an opponent and sensing when she’s deceiving you— that computers lack.
“No limit hold’em is the game you see in tournaments, and it has the reputation of being more of an art than a science,” said Adam Kucharski, author of The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling. “There was the idea that this game would be safer for much longer from these machines.”
That idea has been blown up in recent weeks. In early January, researchers at the University of Alberta released a paper based on a contest in which their own AI, named DeepStack, beat 11 professional poker players.
Whether DeepStack beat Libratus to the punch is a matter of debate. Sandholm said that the pros who played against his bot were better than those DeepStack defeated. Michael Bowling, the head of the University of Alberta’s computer program, conceded this point. But he questioned whether humans are at their best when playing continuously for nearly a month. DeepStack's margin of victory was also three times that of Libratus's.
Both men agree that poker AI has just crossed a significant threshold. For them this has little to do with poker itself. Hold'em is just a way to find sparring partners for their artificial intelligence programs, and the gains made by game-playing bots will filter back into applications like cybersecurity. “This is the main benchmark the community has settled on, but these algorithms are not for poker,” said Sandholm, who was once one of the world's top-ranked windsurfers and kind of looks like Bill Gates. "They’re general purpose.”
DeepStack and Libratus play an unusual version of poker. The computers are matched up against a single opponent, as opposed to a group of players. The number of chips each player holds is reset after every hand, eliminating the complicated psychological game through which players with more chips intimidate poorer players by forcing them to make big bets. Eric Hollreiser, a spokesman for PokerStars, the world's leading online poker platform, said this limits any threat that AI poses to the poker industry. “While on a functional hand-by-hand basis it mimics poker play, it is far, far removed from the reality of what happens at tables,” he said.
There are other experiments going on in less controlled environments. Poker bots have been playing in online cash games for nearly as long as scientists have been building them in labs. They’ve historically played low-stakes games and haven’t been considered very skillful. But bots are spreading into higher-stakes contests, said Chris Grove, a gambling industry analyst and the publisher of Online Poker Report. “If you’re an online poker operator, this is probably your number one fraud concern, and probably by a pretty wide margin,” he said.
The poker industry and the academic poker world have been quietly collaborating for years. Everyone involved remains sketchy on the details. But both the people building commercial bots and those trying to combat them watch the academic work closely. Several of Bowling’s former students have gone on to work for online poker companies. At least one has sold bots used to play online.
“Of course a lot of gambling people are worried that it may kill internet gambling for money, because people are worried that bots are going to be so good that they’re going to be had,” said Sandholm. “That could happen, but that’s not really my concern.”
In poker slang, a computer program that can do your playing for you is called a “dream machine.” Participants in online forums swap notes about when suspicious activity might indicate robotic play — or war stories about how they’ve made their own bots.
PokerStars, which is owned by the Canadian gaming company Amaya, employs 70 people to combat this kind of fraud. Employees call players and ask them to describe their strategies on certain hands. The company has also sent e-mails to players requiring them to make videos in which users rotate the camera 360 degrees to show their surroundings, then play for over an hour with their hands and keyboards fully visible.
Bots don’t have to be wildly skilled at poker to be profitable for their operators — and dangerous to the industry. A program that can make modest profits by exploiting mediocre players may be worth it. But Darse Billings, the head of poker strategy at Gamesys, the UK-based online gaming company, said dream machines and academic AIs are using different techniques and trying to solve fundamentally different challenges. Beating bad players isn’t just a simplified version of beating elite players. It’s a completely separate problem.
More than anyone, Billings understands both poker worlds. He studied the game while getting a master's degree in computer science in the 1990s, then became a professional poker player to pay off his student loans. Several years later he went back to school to work with Jonathan Schaeffer, a computer scientist at the University of Alberta best known for writing software that could play checkers perfectly. Billings convinced Schaeffer to focus on poker next.
To solve checkers, Schaeffer had used a method that essentially attempted to calculate the best move in any relevant situation, without considering what had happened up to that point. But it didn't make sense to think about each move as an isolated problem in a game like poker, where luck is involved and not everyone has access to all the relevant information. The University of Alberta researchers set out to develop an overall strategy. This entailed looking for what is known in game theory as a Nash equilibrium — an approach for playing a two-person game that cannot lose over the long run regardless of what one's opponent does to respond.
A Nash equilibrium isn’t a single ideal style of play. The key to an equilibrium strategy in poker is to play the strongest potential hands while remaining unpredictable. “When you bet your strong hands, there needs to be some doubt,” said Billings. The team developed a cautious AI, dubbed Mr. Pink, and a very aggressive one, named Agent Orange. It’s hard to talk about a computer program that does this without sounding like you’re talking about something that actually thinks.
The equilibrium approach drew the University of Alberta’s Bowling, whose specialty is game theory, to poker in 2003. Sandholm, who sat on Bowling's thesis committee at Carnegie Mellon, turned to poker the following year, and has taken on a similar approach. Sandholm and Bowling started the Annual Computer Poker Competition together in 2006, and have periodically played against top human players. Even as they compete, the labs have been gleaning insights from one another’s research ever since.
Both programs took big steps toward the endgame in the last several years. In January 2015, Bowling’s team published a paper showing how it had solved heads-up limit hold’em, a two-person poker game that is simpler than no-limit hold’em because of restrictions on how players can bet. Sandholm and Brown, a Ph.D. student who has been working with him on poker AI for the last five years, held their first “Brains v. AI” competition against top humans at Rivers Casino several months later. Their bot, named Claudico, lost $732,000 over 80,000 hands played against four professional players. Sandholm said the match was close enough to call a draw, a claim that at least one player disputed.
Sandholm and Brown say there are several general areas their AI has improved since then. Claudico played well in the early stages, but tended to make mistakes at the end of hands. It bluffed at the wrong moments, and had trouble accounting for how the odds of the game changed based on the cards that it knew had been removed from the deck. In its simplest form, this is the reasoning that says that if there are two kings on the table and you have two kings, your opponent can't have any. Libratus has improved in all those areas. Its creators remain coy on some other areas, like specifically how it chooses to make adjustments based on what it learns over the course of a day of play.
All the details of Libratus will eventually be revealed when its creators publish their findings. This kind of academic work tends to filter into real-world poker in various ways. The Annual Computer Poker Competitions have included entrants that also play in cash games, according to Brown. Bowling said his research papers are popular on message boards for people building bots. “There’s this whole separate group of people reading these papers and trying to understand them,” he said.
Billings joined the poker industry in 2008. He’s one of a handful of people who did so after leaving the University of Alberta's program. Most of them have been hired by companies that run gaming platforms. Richard Gibson struck out on his own, starting a company called Robot Shark Gaming that built AI programs for studying and playing strategic games, and then a fantasy sports company called SportsBid.
Gibson was finishing up a Ph.D. in 2013 when a group of professional players approached him, offering to pay for software they could use in training. Gibson was only given one person’s name, never met any of his clients in person, and isn’t sure how many people were in the group. “Even though they weren’t using it to gamble online, there was a stigma,” he said.
Gibson built multiple programs, and said he designed the software to demonstrate the effectiveness of various strategies; it couldn’t play on its own. In his most lucrative year, Gibson made about $100,000 on that project, and his clients paid another $20,000 to $30,000 in fees related to the computing power it took to run the software.
The anonymous pros weren’t Gibson’s only clients. In one case, he said someone paid him tens of thousands of dollars to spend about six months building a lightweight poker bot. He didn’t ask much about how it would be used — he didn’t want to know — but the design pointed to a specific application. “My clients wanted a standalone thing they could load onto their laptop,” he said. “I imagine they’re trying to play online with them.”
At the end of each night at Rivers, Les and his fellow poker pros would order takeout and pore over data about the day’s action in search of Libratus’s weaknesses. Early in the month, they woke up each morning optimistic that they had some new tricks. “There were specific exploits we identified in the first few days,” said Les. “We attacked them and attacked them, and now they’re gone.”
Libratus was also making adjustments. During the day, the program split its computing power between playing the hands in front of it and what Sandholm described as “continuous strategy improvement.” At night, the program focused entirely on strategy, using 600 nodes of the supercomputer, the equivalent of about 3,330 high-end Macbooks working in tandem.
In poker, as in other games that AI has played at the top levels, computers have developed strategies that filter back to human players. Les said he’s trying to figure out how to adapt some of Libratus's irregular betting behavior to his own game. It’s hard. "We just simply do not have the mental capacity to do it,” he said.
If humans have reached the point at which their computer opponents are just too good for them, labs like the ones that Sandholm and Bowling run are facing almost the opposite problem. Head-to-head matches against professionals are one thing. But there’s no clear path to turn Libratus and DeepStack into players that could be confident of beating a group of flawed humans. That’s because the equilibrium strategy that the AIs use fall apart in multiplayer games, when the point isn't to play perfectly but to identify and exploit the shortcomings in other people's games.
Several years ago Bowling did an experiment where three bots played against one another. Two of them used his labs’s closest approximation to perfect play; the third one was programmed to raise recklessly. At the end of the game, the dumbest bot lost a small amount of money. One of the perfect players won big, but the other one lost its shirt.
“That’s really the hard part. How do you reason about these games if you know you’re going to sit down with human players or other programs that aren’t very good?” said Bowling. “You’re going to have to be prepared for that.”