SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Yuhang Jiang; Jianzhun Shao; Shuncheng He; Hongchang Zhang; Xiangyang Ji

SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning

Yuhang Jiang, Jianzhun Shao, Shuncheng He, Hongchang Zhang, Xiangyang Ji

Published: 31 Oct 2022, Last Modified: 11 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Unsupervised Reinforcement Learning, Multi-agent Reinforcement Learning

TL;DR: We propose SPD, a graph-based framework to learn generic coordination policies for multi-agent systems with no extrinsic reward.

Abstract: Reinforcement learning typically relies heavily on a well-designed reward signal, which gets more challenging in cooperative multi-agent reinforcement learning. Alternatively, unsupervised reinforcement learning (URL) has delivered on its promise in the recent past to learn useful skills and explore the environment without external supervised signals. These approaches mainly aimed for the single agent to reach distinguishable states, insufficient for multi-agent systems due to that each agent interacts with not only the environment, but also the other agents. We propose Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning (SPD) to learn generic coordination policies for agents with no extrinsic reward. Specifically, we devise the Synergy Pattern Graph (SPG), a graph depicting the relationships of agents at each time step. Furthermore, we propose an episode-wise divergence measurement to approximate the discrepancy of synergy patterns. To overcome the challenge of sparse return, we decompose the discrepancy of synergy patterns to per-time-step pseudo-reward. Empirically, we show the capacity of SPD to acquire meaningful coordination policies, such as maintaining specific formations in Multi-Agent Particle Environment and pass-and-shoot in Google Research Football. Furthermore, we demonstrate that the same instructive pretrained policy's parameters can serve as a good initialization for a series of downstream tasks' policies, achieving higher data efficiency and outperforming state-of-the-art approaches in Google Research Football.

Supplementary Material: pdf

16 Replies

Loading