Last-Iterate Convergence to Approximate Nash Equilibria in Multiplayer Imperfect Information Games

Published: 01 Jan 2025, Last Modified: 16 Oct 2025IEEE Trans. Neural Networks Learn. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Imperfect information and multiple players are the two common features of real-world games. However, few of the existing game-theoretic methods are applicable to multiplayer imperfect information games (IIGs) when it comes to finding Nash equilibria. Moreover, the commonly used methods that rely on average-iterate convergence are not conducive to deep reinforcement learning (DRL), which is widely applied to large-scale problems, as it is costly to preserve average policies under function approximation. To deal with these problems, we construct a continuous-time dynamic named imperfect-information exponential-decay score-based learning (IESL) by considering the concept of Nash distribution [a type of quantal response equilibrium (QRE)] in IIGs. Theoretically, we prove the last-iterate convergence of IESL to approximate Nash equilibria in multiplayer IIGs under the assumption of individual concavity. Empirically, we verify that IESL converges in six poker scenarios, with the ultimate NashConv lower than that of the comparative methods (including counterfactual regret minimization (CFR), replicator dynamics (RDs), and their variants) in multiplayer Leduc hold’em. When compared with the existing equilibrium-finding algorithms in multiplayer normal-form games (NFGs), IESL also demonstrates a more stable performance. In addition, we observe a trade-off between the difficulty of IESL’s last-iterate convergence and the NashConv of the convergent policies, which aligns with our convergence analysis based on the hypomonotonicity of the game.
Loading