Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Zhenggang Tang; Chao Yu; Boyuan Chen; Huazhe Xu; Xiaolong Wang; Fei Fang; Simon Shaolei Du; Yu Wang; Yi Wu

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu

Published: 12 Jan 2021, Last Modified: 26 May 2025ICLR 2021 PosterReaders: Everyone

Keywords: strategic behavior, multi-agent reinforcement learning, reward randomization, diverse strategies

Abstract: We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover a set of multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games and a real-world game Agar.io, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques. Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents.

One-sentence Summary: We propose an MARL algorithm, RPG, which discovers diverse non-trivial strategic behavior in several challenging multi-agent games.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Code: [![github](/images/github_icon.svg) staghuntrpg/RPG](https://github.com/staghuntrpg/RPG) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=lvRTC669EY_)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/discovering-diverse-multi-agent-strategic/code)

12 Replies

Loading