Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos; Georgios Piliouras; Kelly Spendlove

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 SpotlightReaders: Everyone

Keywords: Exploration-Exploitation, Q-learning, Zero-Sum Polymatrix Games, Quantal Response Equilibria, Bounded Rationality

TL;DR: Complementing recent results about convergence in weighted potential games, we show that Q-learning converges both in competitive as well as cooperative settings, regardless of the number of agents and without any need for parameter fine-tuning.

Abstract: The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games [16,34], we show that fast convergence of Q-learning in competitive settings obtains regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: zip

9 Replies

Loading