From Emergence to Intention: A Statistical Inductive Bias for Tractable Optimization in Multi-Agent Coordination
Keywords: Non-convex Optimization, Multi-Agent Reinforcement Learning (MARL), Statistical World Models, Inductive Bias, Sample Efficiency, Emergent Communication, Optimization for Deep Learning, Scalability
TL;DR: We show that structuring the non-convex optimization of multi-agent communication with a learned statistical forward model for predictive plans dramatically improves scalability and stability over purely emergent, end-to-end optimization approaches.
Abstract: Cooperative multi-agent reinforcement learning (MARL) presents a formidable non-convex optimization challenge, exacerbated by the non-stationarity introduced by concurrently learning agents. A common approach is to enable communication, allowing agents to coordinate. In this paper, we investigate how the structure of this communication impacts the tractability of the underlying optimization problem. We first analyze an end-to-end learned protocol, Learned Direct Communication (LDC), where messages emerge as part of the policy optimization. We then propose Intention Communication, an alternative that structures the optimization by integrating a learned statistical world model. Agents use this model to generate predictive trajectories via self-simulated rollouts and then communicate a compressed summary of this intent. This approach injects a strong inductive bias, reframing the optimization problem from discovering a communication protocol from scratch to learning to interpret explicit plans. We evaluate these methods on a cooperative task-allocation problem under partial observability. Our results demonstrate that while emergent communication is viable in simple settings, its performance degrades with scale. In contrast, by structuring the problem with a predictive statistical model, Intention Communication dramatically improves optimization stability and sample efficiency, achieving near-optimal performance in complex environments where other methods fail. This work underscores the efficacy of bridging statistical modeling with optimization to solve complex coordination problems.
Submission Number: 19
Loading