Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria

Linjian Meng; Youzhi Zhang; Zhenxing Ge; Tianyu Ding; Shangdong Yang; Zheng Xu; Wenbin Li; Yang Gao

Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria

Linjian Meng, Youzhi Zhang, Zhenxing Ge, Tianyu Ding, Shangdong Yang, Zheng Xu, Wenbin Li, Yang Gao

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Regret Matching$^+$, Last-Iterate Convergence, Nash Equilibrium

TL;DR: We investigate last-iterate convergence of Regret Matching$^+$ variants in games satisfying the weak Minty variation inequality.

Abstract: Regret Matching$^+$ (RM$^+$) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM$^+$ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM$^+$ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM$^+$ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM$^+$ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM$^+$ (SOGRM$^+$) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM$^+$ variants. Experiments show that SOGRM$^+$ significantly outperforms other algorithms. Our code is available at https://github.com/menglinjian/NeurIPS-2025-SOGRM.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 8506

Loading