Keywords: reinforcement learning, multi agent systems, cooperative learning, policy gradients
TL;DR: We present a theoretically grounded recursive reasoning framework that enhances cooperation in multi-agent reinforcement learning in both on and off-policy algorithms.
Abstract: Policy gradient algorithms for deep multi-agent reinforcement learning (MARL) typically employ an update that responds to the current strategies of other agents. While being straightforward, this approach does not account for the updates of other agents within the same update step, resulting in miscoordination and reduced sample efficiency. In this paper, we introduce methods that recursively refine the policy gradient by updating each agent against the updated policies of other agents within the same update step, speeding up the discovery of effective coordinated policies. We provide principled implementations of recursive reasoning in MARL by applying it to competitive multi-agent algorithms in both on and off-policy regimes. Empirically, we demonstrate superior performance and sample efficiency over existing deep MARL algorithms in StarCraft II and multi-agent MuJoCo. We
theoretically prove that higher recursive reasoning in gradient-based methods with finite iterates achieves monotonic convergence to a local Nash equilibrium under certain conditions.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 21973
Loading