By Andrew Grant
Even the best poker face won’t work against a new superhuman cardsharp.
For the first time, a computer algorithm has solved a game of poker, heads-up limit Texas Hold’em, making it unbeatable in the long run against any opponent. The achievement, detailed in the Jan. 9 Science, may help develop strategies that maximize return in a business negotiation or minimize the risk of terrorist attacks — even if an adversary knows the strategy.
“It’s a big step toward understanding games that are closer to real-world problems,” says Murray Campbell, a computer scientist at IBM’s Thomas J. Watson Research Center in Yorktown Heights, N.Y. Campbell was not involved in the study but helped develop Deep Blue, the computer that defeated chess champion Garry Kasparov in 1997.
The algorithm, developed by computer scientist Michael Bowling and his team at the University of Alberta in Edmonton, is the first to tackle a commonly played imperfect-information game, in which participants do not have full knowledge of past events. Just as poker players must act without knowing the cards their opponent is holding, researchers want algorithms that can make effective decisions based on robust but incomplete sets of data.
For decades, scientists have tried to create algorithms that win human games by blending the calculating prowess of computers with the decision-making strategies of game theory. Game theory is the branch of mathematics that computes optimum strategies in competitive encounters. In 2007, Alberta computer scientist Jonathan Schaeffer and colleagues solved the game of checkers by simulating the actions of players making the perfect move every turn (SN: 7/21/07, p. 36). It was an impressive achievement, yet the algorithm had a relatively straightforward task: For each turn, it would review the positions of all the pieces, evaluate its options and then choose the best one.
That luxury is not afforded to a computer poker player because it doesn’t know what cards an adversary holds. Another wrinkle is that successful poker players are unpredictable. In checkers, the best move in a given situation is the same for every game. But in poker, it’s wise to bluff every now and then, perhaps increasing the bet some fraction of the time with a weak hand to throw the opponent off. “Most algorithms can’t cope with that kind of uncertainty,” Bowling says.
Bowling and his colleagues at Alberta’s 20-year-old Computer Poker Research Group chose to tackle heads-up limit Texas Hold’em because it is popular and relatively simple. Two players receive two cards each and then can bet a fixed amount a certain number of times as five community, or shared, cards are revealed. The game has more than 300 trillion different situations in which a player must call the bet, raise the bet or fold.
The team developed an algorithm called Cepheus that gradually approached perfection by playing against itself. After each hand, Cepheus calculated a measure of regret, exploiting the benefit of hindsight to determine how much it had strayed from the optimal strategy. For two months, Cepheus ran on more than 4,000 computers, each playing over 6 billion hands a second; the algorithm constantly improved as it used its past regret to guide its play. After about a billion billion hands — “that’s more hands of poker than humanity has ever played,” Bowling says — the algorithm’s regret came very close to zero, the mathematical equivalent of perfect play.
By coming up with an optimal set of probabilities for each possible scenario in a hand, the researchers ensured that Cepheus would never lose money in the long run despite losing some hands. That would hold true even if the opponent knew the computer’s strategy. Bowling says Cepheus isn’t quite a perfect player, but its play is indistinguishable from ideal over the number of hands a person could play in a lifetime.
While Bowling isn’t planning any computer-human poker showdowns (a computer already beat poker pros at limit Hold’em in 2008), he says the program, which is available online, could be a great tool for amateurs learning the game. Aspiring players can square off against a perfect opponent for free rather than risk big money against seasoned but inevitably imperfect pros.
Beyond poker, Bowling says, scientists can apply this unprecedented combination of artificial intelligence and game theory to major real-world problems such as nabbing terrorists at airports and catching fare evaders on transit systems. In each of these situations, officials must come up with a security plan that is effective even if criminals either employ a surprise attack or get hold of the plan.
Now that Bowling and colleagues have conquered heads-up limit Hold’em, they are continuing work on three-player and no-limit Hold’em. Researchers also want to develop programs that exploit weak opponents, Schaeffer says. Cepheus never loses in the long run, but it also never strays from its strategy. Poker pros, on the other hand, will gladly take risks to bleed a predictable opponent of chips. “Humans are strangely good at that,” Schaeffer says.