Discovering Diverse Multi-Agent Strategic Behavior via Reward RandomizationDownload PDF

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 PosterReaders: Everyone
Keywords: strategic behavior, multi-agent reinforcement learning, reward randomization, diverse strategies
Abstract: We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover a set of multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games and a real-world game, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques. Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents.
One-sentence Summary: We propose an MARL algorithm, RPG, which discovers diverse non-trivial strategic behavior in several challenging multi-agent games.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Code: [![github](/images/github_icon.svg) staghuntrpg/RPG]( + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](
12 Replies