The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Chao Yu; Akash Velu; Eugene Vinitsky; Jiaxuan Gao; Yu Wang; Alexandre Bayen; Yi Wu

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, Yi Wu

Published: 17 Sept 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Datasets and Benchmarks Readers: Everyone

Keywords: Multi-Agent Reinforcement Learning, Proximal Policy Optimization, Cooperative Games

TL;DR: We demonstrate PPO's effectiveness in popular multi-agent benchmarks and analyze its properties and implementation details through empirical studies.

Abstract: Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, the Hanabi challenge, and Google Research Football, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter factors that are critical to PPO's empirical performance, and give concrete practical suggestions regarding these factors. Our results show that when using these practices, simple PPO-based methods are a strong baseline in cooperative multi-agent reinforcement learning. Source code is released at https://github.com/marlbenchmark/on-policy.

Supplementary Material: pdf

URL: https://github.com/marlbenchmark/on-policy, https://github.com/marlbenchmark/off-policy

License: MIT License

Author Statement: Yes

Contribution Process Agreement: Yes

In Person Attendance: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2103.01955/code)

30 Replies

Loading