Beyond Game Theory Optimal: Profit-Maximizing Poker Agents for No-Limit Hold’em

Agents4Science 2025 Conference Submission163 Authors

14 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Game theory, GTO, exploitative strategy, counterfactual regret minimization (CFR), MCCFR, Deep CFR, NFSP, multiway evaluation, poker AI, reinforcement learning, self-play learning, Nash equilibrium, counterfactual regret, regret minimization, strategy convergence, Monte Carlo sampling, neural fictitious self-play, poker simulation, multi-agent systems
TL;DR: finding game theory optimal plays and exploiting oppenents in poker(no-limit hold'em)
Abstract: Game theory has grown into a major field over the past few decades, and poker has long served as one of its key case studies. Game-Theory-Optimal (GTO) provides strategies to avoid loss in poker, but pure GTO does not guarantee maximum profit. To this end, we aim to develop a model that outperforms GTO strategies to maximize profit in No Limit Hold’em, in heads-up (two-player) and multi-way (more than two-player) situations. Our model finds the GTO foundation and goes further to exploit opponents. The model first navigates toward many simulated poker hands against itself and keeps adjusting its decisions until no action can reliably beat it, creating a strong baseline that is close to the theoretical best strategy. Then, it adapts by observing opponent behavior and adjusting its strategy to capture extra value accordingly. Our results indicate that Monte-Carlo Counterfactual Regret Minimization (CFR) performs best in heads-up situations and CFR remains the strongest method in most multi-way situations. By combining the defensive strength of GTO with real-time exploitation, our approach aims to show how poker agents can move from merely not losing to consistently winning against diverse opponents.
Supplementary Material: zip
Submission Number: 163
Loading