- Keywords: multi-agent reinforcement learning, risk-sensitive reinforcement learning, reinforcement learning, distributional reinforcement learning
- Abstract: In cooperative multi-agent reinforcement learning, state transitions, rewards, and actions can all induce randomness (or uncertainty) in the observed long-term returns. These randomnesses are reflected from two risk sources: (a) agent-wise risk (i.e., how cooperative our teammates act for a given agent) and (b) environment-wise risk (i.e., transition stochasticity). Although these two sources are both important factors for learning robust policies of agents, prior works do not separate them or deal with only a single risk source, which could lead to suboptimal equilibria. In this paper, we propose Disentangled RIsk-sensitive Multi-Agent reinforcement learning (DRIMA), a novel framework being capable of disentangling risk sources. Our main idea is to separate risk-level leverages (i.e., quantiles) in both centralized training and decentralized execution with a hierarchical quantile structure and quantile regression. Our experiments demonstrate that DRIMA significantly outperforms prior-arts across various scenarios in StarCraft Multi-agent Challenge. Notably, DRIMA shows robust performance regardless of reward shaping, exploration schedule, where prior methods learn only a suboptimal policy.
- One-sentence Summary: We propose a novel distributional multi-agent reinforcement learning algorithm with state-of-the-art performance.
- Supplementary Material: zip