Learning What to Say and How Precisely: Efficient Communication via Differentiable Discrete Communication Learning

Learning What to Say and How Precisely: Efficient Communication via Differentiable Discrete Communication Learning

ICLR 2026 Conference Submission13887 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning (MARL), Differentiable Communication, Communication Efficiency, Discrete Communication, Message Precision, Unbiased Gradients

TL;DR: We created a universal, plug-and-play DDCL layer for MARL that learns message precision, cutting bandwidth by >10x without performance loss. It also lets a simple Transformer match complex, bespoke architectures.

Abstract: Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide whether to communicate, not how precisely. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \textit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the "Bitter Lesson" in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 13887

Loading