Keywords: Learning to Communicate, Multi-Agent Reinforcement Learning, Sample and Computation Complexity, Information Structures
TL;DR: We studied learning-to-communicate problem in multi-agent RL and provided provable planning and learning algorithms.
Abstract: Learning-to-communicate (LTC) in partially observable environments has gained increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are \emph{jointly} learned. On the other hand, the impact of communication has been extensively studied in control theory, through the lens of \emph{information structures} (ISs). In this paper, we seek to formalize and better understand LTC by bridging these two lines of work. To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs), and classify LTCs based on the ISs. We first show that non-classical LTCs are computationally intractable, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, violating which can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and show that examples of QC LTCs satisfying the above conditions can be solved without computationally intractable oracles. Along the way, we also establish some relationship between (strictly) QC IS and the condition of having strategy-independent CIB beliefs (SI-CIB), as well as solving general Dec-POMDPs beyond those with SI-CIB, the only known condition that enables planning/learning in Dec-POMDPs without computationally intractable oracles, which may be of independent interest.
Submission Number: 7
Loading