Optimal Action Abstraction for Imperfect Information Extensive-Form Games

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Game Theory, Imperfect Information Games, Extensive-Form Games, Regret Minimization, Reinforcement Learning, Texas Hold'em
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Action abstraction is critical for solving imperfect information extensive-form games (IIEFGs) with large action spaces. However, due to the large number of states and high computational complexity in IIEFGs, existing methods often focus on using a fixed abstraction, which can result in sub-optimal performance. To tackle this issue, we propose a novel Markov Decision Process (MDP) formulation for finding the optimal (and possibly state-dependent) action abstraction. Specifically, the state of the MDP is defined as the public information of the game, each action is a feature vector representing a particular action abstraction, and the reward is defined as the expected value difference between the selected action abstraction and a default fixed action abstraction. Based on this MDP, we build a game tree with the action abstraction selected by reinforcement learning (RL), and solve for the optimal strategy based on counterfactual regret minimization (CFR). This two-phase framework, named RL-CFR, effectively trades off computational complexity (due to CFR) and performance improvement (due to RL) for IIEFGs, and offers a novel RL-guided action abstraction selection in CFR. To demonstrate the effectiveness of RL-CFR, we apply the method to solve Heads-up No-limit (HUNL) Texas Hold'em, a popular representative benchmark for IIEFGs. Our results show that RL-CFR defeats ReBeL, one of the best fixed action abstraction-based HUNL algorithms, and a strong HUNL agent Slumbot by significant win-rate margins $64\pm 11$ and $84\pm 17$ mbb/hand, respectively.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2605
Loading