Modelling the Dynamics of Multi-Agent Q-learning: The Stochastic Effects of Local Interaction and Incomplete Information
Abstract: The theoretical underpinnings of multiagent reinforcement learning has recently attracted much attention. In this work, we focus on the generalized social learning (GSL) protocol --- an agent interaction protocol that is widely adopted in the literature, and aim to develop an accurate theoretical model for the Q-learning dynamics under this protocol. Noting that previous models fail to characterize the effects of local interactions and incomplete information that arise from GSL, we model the Q-values dynamics of each individual agent as a system of stochastic differential equations (SDE). Based on the SDE, we express the time evolution of the probability density function of Q-values in the population with a Fokker-Planck equation. We validate the correctness of our model through extensive comparisons with agent-based simulation results across different types of symmetric games. In addition, we show that as the interactions between agents are more limited and information is less complete, the population can converge to a outcome that is qualitatively different than that with global interactions and complete information.
0 Replies
Loading