Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement LearningDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Multi agent, reinforcement learning, neural network, deep learning, trust region.
TL;DR: A multi-agent policy gradient reinforcement learning based on local advantage constraned.
Abstract: We explore the value-based method and policy gradient combination in multi-agent reinforcement learning (MARL). In value-based MARL, {\itshape{Individual-Global-Max}} (IGM) principle plays an important role, which maintains the consistency between joint and local action values. At the same time, IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called {\itshape{Advantage Constrained Proximal Policy Optimization}} (ACPPO). Based on {\itshape{multi-agent advantage decomposition lemma}}, ACPPO introduces an advantage network for each agent to estimate current local state-action advantage. The coefficient of each agent constrains the joint-action advantage according to the consistency of the estimated joint-action advantage and local advantage. Unlike previous policy gradient-based MARL algorithms, ACPPO does not need an extra sampled baseline to reduce variance. We evaluate the proposed methods for continuous matrix game and Multi-Agent MuJoCo tasks. Results show that ACPPO outperforms the baselines such as MAPPO, MADDPG, and HAPPO.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
5 Replies