Regularization is Enough for Last-Iterate Convergence in Zero-Sum Games

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: two-player zero-sum games, reinforcement learning, last-iterate convergence
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recent literature has witnessed a rising interest in learning Nash equilibrium with a guarantee of last-iterate convergence. In this paper, we introduce a novel approach called Regularized Follow-the-Regularized-Leader (RegFTRL) for the purpose of learning equilibria in two-player zero-sum games. RegFTRL is an efficient variant of FTRL, enriched with an adaptive regularization that encompasses the well-known entropy regularization as a special case. In the context of normal-form games (NFGs), our proposed RegFTRL algorithm exhibits the desirable property of last-iterate linear convergence towards an approximated equilibrium. Furthermore, it converges to an exact Nash equilibrium through adaptive adjustments of the regularization. In extensive-form games (EFGs), we demonstrate that the entropy-regularized Multiplicative Weights Update (MWU), a specific instance of RegFTRL, can achieve a last-iterate linear convergence rate towards the quantal response equilibrium, all without the need for either an optimistic update or reliance on uniqueness assumptions.These results show that regularization is enough for last-iterate convergence. Additionally, we propose FollowMu, a practical implementation of RegFTRL with a neural network as the function approximator, for model-free learning in sequential non-stationary environments. Finally, empirical results substantiate the theoretical properties of RegFTRL, and demonstrate that FollowMu achieves favorable performance in EFGs.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3281
Loading