Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Stefanos Leonardos; Will Overman; Ioannis Panageas; Georgios Piliouras

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Stefanos Leonardos, Will Overman, Ioannis Panageas, Georgios Piliouras

Published: 25 Apr 2022, Last Modified: 05 May 2023ICLR 2022 Workshop on Gamification and Multiagent SolutionsReaders: Everyone

Keywords: Multi-agent Reinforcement Learning, Markov Potential Games, Policy Gradient

TL;DR: Convergence of policy gradient in a class of MDPs called Markov Potential Games in which cooperation is desired.

Abstract: Potential games are one of the most important and widely studied classes of normal-form games. They define the archetypal setting of multi-agent coordination in which all agents utilities are perfectly aligned via a common potential function. Can we embed this intuitive framework in the setting of Markov games? What are the similarities and differences between multi-agent coordination with and without state dependence? To answer these questions, we study a natural class of Markov Potential Games (MPGs) that generalizes prior attempts to capture complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over since MPGs involve Markov games with zero-sum state-games, but Markov games in which all state-games are potential games are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main result, we prove convergence of independent policy gradient and its stochastic counterpart to Nash policies at a rate that is polynomial in the approximation error by adapting single-agent gradient domination properties to multi-agent settings. This answers questions on the convergence of finite-sample, independent policy gradient methods beyond settings of pure conflicting or pure common interests.

1 Reply

Loading