Normalizing Flow Policies for Multi-agent Systems

Xiaobai Ma, Jayesh K. Gupta, Mykel J. Kochenderfer

2020 (modified: 08 May 2021)GameSec 2020Readers: Everyone

Abstract: Stochastic policy gradient methods using neural representations have had considerable success in single-agent domains with continuous action spaces. These methods typically use networks that output the parameters of a diagonal Gaussian distribution from which the resulting action is sampled. In multi-agent contexts, however, better policies may require complex multimodal action distributions. Based on recent progress in density modeling, we propose an alternative for policy representation in the form of conditional normalizing flows. This approach allows for greater flexibility in action distribution representation beyond mixture models. We demonstrate their advantage over standard methods on a set of tasks including human behavior modeling and reinforcement learning in multi-agent settings.

0 Replies