TurboReBeL: 250$\times$ Accelerated Belief Learning for large Imperfect-Information Extensive-Form Games
Keywords: Game Theory, Imperfect Information Games, Counterfactual Regret Minimization, Deep Reinforcement Learning, Depth-limited Solving
TL;DR: We introduce a novel general depth-limited solving framework for large IIEFGs, achieving unprecedented training efficiency without sacrificing the strengths
Abstract: Recursive Belief-based Learning (ReBeL) provides a general framework for large-scale Imperfect-Information Extensive-Form Games (IIEFGs) by integrating self-play reinforcement learning with search. However, ReBeL suffers from prohibitive computational costs during training: each Public Belief State (PBS) sample requires $T$ iterations of Counterfactual Regret Minimization (CFR), and the PBS state space necessitates billions of samples for convergence. For example, training ReBeL on games such as Heads-Up No-Limit Texas Hold'em (HUNL) from scratch demands $4.5$ billion samples and $2$ million GPU hours. To address this, we propose TurboReBeL, which achieves a $\sim 250\times$ acceleration in training through two key innovations: (i) Single-Sample Multi-Iteration Generation: This core innovation fixes subgame strategies to CFR-averaged policies, generating data for all $T$ iterations in one sampling pass and yielding a theoretical $O\left(T\right)$ speedup. (ii) Isomorphic Data Augmentation: This technique enhances sample diversity through game-theoretic invariants (suit and chip isomorphism) with minimal overhead and no performance loss. Evaluations show that TurboReBeL matches ReBeL's exploitability in Turn Endgame Hold'em using approximately $0.4$\% of the training cost, and achieves comparable performance on HUNL with $450\times$ fewer samples. TurboReBeL is the first depth-limited solving framework that combines ultra-fast training, strong scalability, low exploitability, theoretical convergence guarantees, human-data-free training, and fast real-time decision-making, representing a fundamental breakthrough in solving IIEFGs.
Primary Area: reinforcement learning
Submission Number: 5768
Loading