Multi-Agent Matrix Games with Individual learners: How Exploration-Exploitation Strategies Impact the Emergence of Coordination

Julien Armand; Tommy Chien-Hsuan Lin; Maxime Heuillet; Audrey Durand

Multi-Agent Matrix Games with Individual learners: How Exploration-Exploitation Strategies Impact the Emergence of Coordination

Julien Armand, Tommy Chien-Hsuan Lin, Maxime Heuillet, Audrey Durand

Published: 23 Jun 2025, Last Modified: 21 Jul 2025CoCoMARL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Multi-agent, Game theory, Bandit

TL;DR: We study multi‐agent matrix games with independent learners. We show that convergence to the optimal behavior depends on the interplay between specific exploration-exploitation strategies of the learners.

Abstract: Coordination between independent learning agents in a multi-agent environment is an important problem where AI systems may impact each others learning process. In this paper, we study how individual agents converge to optimal equilibrium in multi-agent where coordination is necessary to achieve optimality. Specifically, we cover the case of coordination to maximize every individual payoffs and coordination to maximize the collective payoff (cooperation). We study the emergence of such coordination behaviours in two-players matrix games with unknown payoff matrices and noisy bandit feedback. We consider five different environments along with widely used deterministic and stochastic bandit strategies. We study how different learning strategies and observation noise influence convergence to the optimal equilibrium. Our results indicate that coordination often emerge more easily from interactions between deterministic agents, especially when they follow the same learning behaviour. However, stochastic learning strategies appear to be more robust in the presence of many optimal joint actions. Overall, noisy observations often help stabilizing learning behaviours.

Submission Number: 18

Loading