Abstract: Training multiple agents to coordinate is an essential problem with
applications in robotics, game theory, economics, and social sciences. However, most existing Multi-Agent Reinforcement Learning
(MARL) methods are online and thus impractical for real-world
applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when
available, doing so gives rise to what we call the offline coordination problem. Specifically, we identify and formalize the strategy
agreement (SA) and the strategy fine-tuning (SFT) coordination challenges, two issues at which current offline MARL algorithms fail.
Concretely, we reveal that the prevalent model-free methods are
severely deficient and cannot handle coordination-intensive offline
multi-agent tasks in either toy or MuJoCo domains. To address
this setback, we emphasize the importance of inter-agent interactions and propose the very first model-based offline MARL method.
Our resulting algorithm, Model-based Offline Multi-Agent Proximal
Policy Optimization (MOMA-PPO) generates synthetic interaction
data and enables agents to converge on a strategy while fine-tuning
their policies accordingly. This simple model-based solution solves
the coordination-intensive offline tasks, significantly outperforming the prevalent model-free methods even under severe partial
observability and with learned world models.
Loading