MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

TMLR Paper480 Authors

05 Oct 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, and MADDPG and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at \url{https://github.com/kargarisaac/macrpo}.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The changes are done based on the constructive feedbacks from reviewers and pointed to the section we made the change in our answers to the reviewers. It was mostly about adding a new baseline and two new papers in the related works section and clarification about some points reviewers mentioned.
Assigned Action Editor: ~DJ_Strouse1
Submission Number: 480
Loading