Abstract: Modelling the dynamics of multi-agent reinforcement learning has long been an important research topic. Most of the previous works focus on agents learning under global interactions. In this paper, we investigate learning in a population of agents with local interactions, such that agents learn their policies concurrently by playing with some other agents locally, without the knowledge of the whole population. We derive the stochastic differential equations (SDEs) to describe the Q-values dynamics of each individual agent under the stochastic environment. Applying the Fokker-Planck equation, the time evolution of the probability distribution (PDF) of the population Q-values is worked out. We validate our model through comparisons with agent-based simulations on typical symmetric games with various settings, and the results verify that the model can precisely capture the behaviour of the multi-agent system.
0 Replies
Loading