Off-Policy Correction For Multi-Agent Reinforcement Learning

Michał Zawalski; Błażej Osiński; Henryk Michalewski; Piotr Miłoś

Off-Policy Correction For Multi-Agent Reinforcement Learning

Michał Zawalski, Błażej Osiński, Henryk Michalewski, Piotr Miłoś

12 Oct 2021 (modified: 04 May 2025)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: Reinforcement Learning, V-Trace, Importance Sampling, Scalability

TL;DR: We introduce MA-Trace, a simple yet effective multi-agent reinforcement learning algorithm.

Abstract: Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded - we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/off-policy-correction-for-multi-agent/code)

0 Replies

Loading