Abstract: In cooperative multi-agent reinforcement learning (MARL), agents often can only partially observe the environment state, and thus communication is crucial to achieving coordination. Communicating agents must simultaneously learn to whom to communicate (i.e., communication topology) and how to interpret the received message for decision-making. Although agents can efficiently learn communication interpretation by end-to-end backpropagation, learning communication topology is much trickier since the binary decisions of whether to communicate impede end-to-end differentiation. As evidenced in our experiments, existing solutions, such as reparameterization tricks and reformulating topology learning as reinforcement learning, often fall short. This paper introduces a meta-learning framework that aims to discover and continually adapt the update rules for communication topology learning. Empirical results show that our meta-learning approach outperforms existing alternatives in a range of cooperative MARL tasks and demonstrates a reasonably strong ability to generalize to tasks different from meta-training. Preliminary analyses suggest that, interestingly, the discovered update rules occasionally resemble the human-designed rules such as policy gradients, yet remaining qualitatively different in most cases.
Contribution Process Agreement: Yes
Poster Session Selection: Poster session #2 (15:00 UTC), Poster session #3 (16:50 UTC)