Track: Research Track
Keywords: Learning to Communicate, Multi-Agent Reinforcement Learning, Sample and Computation Complexity, Information Structures
Abstract: Learning-to-communicate (LTC) in partially observable environments has emerged and received increasing attention in deep multi-agent reinforcement learning, where the control and communication strategies are \emph{jointly} learned. On the other hand, the impact of communication has been extensively studied in control theory. In this paper, we seek to formalize and better understand LTC by bridging these two lines of work, through the lens of \emph{information structures} (ISs). To this end, we formalize LTC in decentralized partially observable Markov decision processes (Dec-POMDPs) under the common-information-based (CIB) framework, and classify LTCs based on the ISs before additional information sharing. We first show that non-classical LTCs are computationally intractable in general, and thus focus on quasi-classical (QC) LTCs. We then propose a series of conditions for QC LTCs, violating which can cause computational hardness in general. Further, we develop provable planning and learning algorithms for QC LTCs, and show that examples of QC LTCs satisfying the above conditions can be solved without computationally intractable oracles. Along the way, we also establish some relationships between (strictly) QC IS and the condition of strategy-independent CIB beliefs (SI-CIB), as well as solving general Dec-POMDPs without computationally intractable oracles beyond those with the SI-CIB condition, which may be of independent interest.
Submission Number: 120
Loading