MACE: Structured Exploration via Deep Hierarchical Coordination


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: In multi-agent reinforcement learning environments, exploration can be inefficient as the joint policy space is in general exponentially large. Hence, even if the dynamics are simple, the optimal policy can be combinatorially hard to discover. We introduce Multi-Agent Coordinated Exploration (MACE), a deep reinforcement learning approach to learn expressive multi-agent policies that are both statistically and computationally efficient. MACE implements a form of multi-agent coordination during exploration and execution, by using a hierarchical stochastic multi-agent policy class that encodes flexible structure between individual policies. In this way, MACE preserves expressivity along with low sample complexity. In order to make learning tractable, we derive a joint learning and exploration strategy by combining hierarchical variational inference with actor-critic learning. The benefits of our learning approach are that 1) it is principled, 2) simple to implement and 3) easily scalable to settings with many agents. We demonstrate empirically that MACE can more efficiently learn optimal policies in challenging multi-agent games with a large number (∼ 20) of agents, compared to conventional baselines. Moreover, we show that our hierarchical structure leads to meaningful agent coordination.
  • TL;DR: Explore more efficiently in deep multi-agent reinforcement learning through agent coordination with structured policies.
  • Keywords: Multi-agent Learning, Deep Reinforcement Learning, Structured Variational Inference, Multi-agent Coordination