Double Neural Counterfactual Regret Minimization

Hui Li; Kailiang Hu; Zhibang Ge; Tao Jiang; Yuan Qi; Le Song

Double Neural Counterfactual Regret Minimization

Hui Li, Kailiang Hu, Zhibang Ge, Tao Jiang, Yuan Qi, Le Song

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Counterfactual regret minimization (CRF) is a fundamental and effective technique for solving imperfect information games. However, the original CRF algorithm only works for discrete state and action spaces, and the resulting strategy is maintained as a tabular representation. Such tabular representation limits the method from being directly applied to large games and continuing to improve from a poor strategy profile. In this paper, we propose a double neural representation for the Imperfect Information Games, where one neural network represents the cumulative regret, and the other represents the average strategy. Furthermore, we adopt the counterfactual regret minimization algorithm to optimize this double neural representation. To make neural learning efficient, we also developed several novel techniques including a robust sampling method, mini-batch Monte Carlo counterfactual regret minimization (MCCFR) and Monte Carlo counterfactual regret minimization plus (MCCFR+) which may be of independent interests. Experimentally, we demonstrate that the proposed double neural algorithm converges significantly better than the reinforcement learning counterpart.

Keywords: Counterfactual Regret Minimization, Imperfect Information game

TL;DR: We proposed a double neural CFR which can match the performance of tabular based CFR and opens up the possibility for a purely neural approach to directly solve large imperfect information game.

17 Replies

Loading