AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning
Abstract: Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative prior
works like DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) and its variants to tackle
HUNL. However, the prohibitive computation cost of CFR
iteration makes it difficult for subsequent researchers to learn
the CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a highperformance and lightweight HUNL AI obtained with an endto-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include a
novel state representation of card and betting information, a
multi-task self-play training loss function, and a new model
evaluation and selection metric to generate the final model.
In a study involving 100,000 hands of poker, AlphaHoldem
defeats Slumbot and DeepStack using only one PC with three
days training. At the same time, AlphaHoldem only takes 2.9
milliseconds for each decision-making using only a single
GPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot,
and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.
Loading