AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Enmin Zhao

Published: 21 Feb 2022, Last Modified: 26 Sept 2025AAAI 2022EveryoneCC BY 4.0

Abstract: Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative prior works like DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) and its variants to tackle HUNL. However, the prohibitive computation cost of CFR iteration makes it difficult for subsequent researchers to learn the CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a highperformance and lightweight HUNL AI obtained with an endto-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include a novel state representation of card and betting information, a multi-task self-play training loss function, and a new model evaluation and selection metric to generate the final model. In a study involving 100,000 hands of poker, AlphaHoldem defeats Slumbot and DeepStack using only one PC with three days training. At the same time, AlphaHoldem only takes 2.9 milliseconds for each decision-making using only a single GPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot, and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.