The Stochastic Evolutionary Dynamics of Softmax Policy Gradient in Games

Published: 01 Jan 2024, Last Modified: 15 May 2025AAMAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The theoretical underpinnings of multi-agent learning have recently attracted much attention. In this paper, we study the learning dynamics of the softmax policy gradient (PG) algorithm in multi-agent environments in the context of evolutionary game theory. We revisit the previous analyses based on mean dynamics and observe that previous models fail to characterize the effect of stochasticity. To this end, we propose a stochastic dynamics model to analyse the learning dynamics of PG under symmetric games. We model the parameter dynamics of the learning agent as a multidimensional Wiener process. Applying the Itô's lemma, we obtain the corresponding policy dynamics for the agent. From that, we study the convergence behaviour of the policy dynamics under the self-play training scheme for learning in games. We work out the sufficient conditions for the stochastic stability of the pure Nash equilibrium strategy, and we evaluate the sufficient conditions for the existence of stationary distribution for strictly stable games. Moreover, we express the dynamics of the parameter distribution with the Fokker-Planck equation. In the experiments, we demonstrate that our stochastic dynamics model always provides a significantly more accurate description of the actual learning dynamics than the mean dynamics model across different games and settings.
Loading