Hierarchical Deep Counterfactual Regret Minimization

06 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Counterfactual Regret Minimization, Hierarchical Reinforcement Learning
TL;DR: HDCFR extends Deep CFR with hierarchical policies and low-variance outcome sampling. It preserves CFR’s convergence, scales to long-horizon IIGs, outperforms strong poker baselines, and learns interpretable skills.
Abstract: Imperfect Information Games (IIGs) are used to model games where decision makers face uncertainty or lack complete information. Counterfactual Regret Minimization (CFR) is one of the most successful family of algorithms for tackling IIGs. The integration of skill-based strategy learning with CFR could potentially mirror more human-like decision-making process and enhance the learning performance for complex IIGs. It enables the learning of a hierarchical strategy, wherein low-level components represent skills for solving subgames and the high-level component manages the transition between skills. In this paper, we introduce the first hierarchical version of Deep CFR (HDCFR), an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees. Notably, HDCFR enables learning with predefined (human) expertise and extracting skills transferable to similar tasks. To achieve this, we first design our algorithm and establish its theoretical foundations in a tabular setting, including hierarchical CFR updating rules and a variance-reduced Monte Carlo sampling extension. Then, we propose deep learning objectives to extend our proposed algorithm to large-scale tasks, while preserving its theoretical guarantees. The code is available at: https://anonymous.4open.science/r/HDCFR.
Primary Area: reinforcement learning
Submission Number: 2581
Loading