Off-Beat Multi-Agent Reinforcement Learning

Wei Qiu; Weixun Wang; Rundong Wang; Bo An; Yujing Hu; Svetlana Obraztsova; Zinovi Rabinovich; Jianye HAO; Yingfeng Chen; Changjie Fan

Off-Beat Multi-Agent Reinforcement Learning

Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu, Svetlana Obraztsova, Zinovi Rabinovich, Jianye HAO, Yingfeng Chen, Changjie Fan

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone

Keywords: multi-agent system, multi-agent reinforcement learning

Abstract: We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent, i.e., all actions have pre-set execution durations. During execution durations, the environment changes are influenced by, but not synchronised with, action execution. Such a setting is ubiquitous in many real-world problems. However, most MARL methods assume actions are executed immediately after inference, which is often unrealistic and can lead to catastrophic failure for multi-agent coordination with off-beat actions. In order to fill this gap, we develop an algorithmic framework for MARL with off-beat actions. We then propose a novel episodic memory, LeGEM, for model-free MARL algorithms. LeGEM builds agents’ episodic memories by utilizing agents’ individual experiences. It boosts multi-agent learning by addressing the challenging temporal credit assignment problem raised by the off-beat actions via our novel reward redistribution scheme, alleviating the issue of non-Markovian reward. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks. Empirical results show that LeGEM significantly boosts multi-agent coordination and achieves leading performance and improved sample efficiency.

Supplementary Material: zip

28 Replies

Loading