Emergence of Consensus in Multi-Agent Q-learning for Stochastic Games

Emergence of Consensus in Multi-Agent Q-learning for Stochastic Games

ICLR 2026 Conference Submission21781 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-agent reinforcement learning, consensus, learning dynamics

Abstract: Multi-agent reinforcement learning (MARL) has made significant progress in diverse fields. A key challenge in MARL is consensus, which aligns individual estimates, reduces non-stationarity, and promotes coordinated behavior among agents. In this paper, we study a MARL system where agents interact in stochastic games and adapt their values and policies through independent Q-learning. In contrast to the prevailing literature that requires explicit consensus protocols, we study how consensus can emerge intrinsically without assuming any external coordination. We find that the covariance between Q-values and temporal-difference (TD) targets is the key quantity governing consensus, and the dynamics of variance of Q-values directly correspond to the second-order Price equation in evolutionary game theory. In addition, we prove that for large-scale anonymous stochastic games and a large batch size limit, independent learners naturally achieve consensus. We validate our findings through extensive agent-based simulations. Our results provide new insights into the learning dynamics of large-scale MARL systems, reveal the potential of intrinsic consensus to advance both theory and practice, and pave the way toward more scalable and efficient intelligent systems.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 21781

Loading