Calibrated Spiking Messages for Emergent Multi-Agent Communication

ICLR 2026 Conference Submission18583 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Emergent communication; Multi-agent reinforcement learning; Spiking neural networks; Neuromorphic computing; Bandwidth-aware communication; Confidence calibration; Temporal coding; Referential games.
TL;DR: Calibrated, bandwidth-aware MARL with spiking messages from a pretrained encoder yields accurate, sample-efficient, well-organised protocols with fewer synaptic operations than continuous baselines—neuromorphic-ready.
Abstract: We study emergent communication in multi-agent reinforcement learning (MARL) via a calibrated, bandwidth-aware framework that exchanges spiking messages built on a pretrained perceptual code. Agents share a spiking encoder (COMMSMOD) trained with a prototype–contrastive–sparsity objective, and use independent attention-based decision heads (DECISIONMOD) trained using calibration-aware Q-Learning. In referential games on Fashion-MNIST, with agents alternating sender/receiver roles, we assess protocol quality using within vs between class similarity, temporal attention consistency, and calibration; spike count serves as a bandwidth proxy. Experiments demonstrate the spiking channel yields accurate and sample-efficient communication, improves protocol discriminability, and reduces synaptic operations versus a matched continuous ANN baseline. Ablations show that (i) the shared pretrained encoder, (ii) temporal attention, and (iii) calibration terms are each necessary. Overall, semantically anchored, calibrated spiking communication offers a favourable accuracy–robustness–bandwidth trade-off and a practical route to neuromorphic deployment.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 18583
Loading