Keywords: Regret Matching$^+$, Last-Iterate Convergence, Nash Equilibrium
TL;DR: We investigate the last-iterate convergence of Regret Matching$^+$ variants in games satisfying monotonicity or even only the weak Minty variation inequality.
Abstract: Regret Matching$^+$ (RM$^+$) variants have been widely developed to superhuman Poker AIs, yet few studies investigate their last-iterate convergence. Their last-iterate convergence has been demonstrated only for games with strong monotonicity or two-player zero-sum matrix games. A primary obstacle in proving the last-iterate convergence for these algorithms is that their feedback is not the loss gradient of the vanilla games. This deviation results in the absence of crucial properties, \eg, monotonicity or the weak Minty variation inequality (MVI), which are pivotal for establishing the last-iterate convergence. To address the absence of these properties, we propose a remarkably succinct yet novel proof paradigm that consists of: (i) recovering these key properties through the equivalence between RM$^+$ and Online Mirror Descent (OMD), and (ii) measuring the the distance to Nash equilibrium (NE) via the tangent residual to show this distance is related to the distance between accumulated regrets. To show the practical applicability of our proof paradigm, we use it to prove the last-iterate convergence of two existing smooth RM$^+$ variants, Smooth Extra-gradient RM$^+$ (SExRM$^+$) and Smooth Predictive RM$^+$ (SPRM$^+$). We show that they achieve last-iterate convergence in learning an NE of games satisfying monotonicity, a weaker condition than the one used in existing proofs for both variants. Then, inspired by our proof paradigm, we propose Smooth Optimistic Gradient RM$^+$ (SOGRM$^+$). We show that SOGRM$^+$ achieves last-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition in all known proofs for RM$^+$ variants. The experimental results show that SOGRM$^+$ significantly outperforms other algorithms.
Primary Area: other topics in machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6343
Loading