What Do Latent Agents Actually Represent? Interpreting Hidden-State Communication in Multi-Agent Systems

Published: 27 May 2026, Last Modified: 27 May 2026CompLearn 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: latent multi-agent systems, hidden-state communication, mechanistic interpretability, redundancy, linear separability, task identity, agent roles, base-model geometry, collaboration, reasoning benchmarks
TL;DR: We analyze latent multi-agent systems and find their hidden communication mostly reflects redundant base-model representations rather than genuine emergent collaboration
Abstract: Latent multi-agent systems replace text communication between agents with direct hidden-state transfer, promising richer inter-agent collaboration. Yet what information actually flows through these latent channels remains poorly understood. We present a systematic mechanistic analysis of a four-agent system evaluating on mathematical reasoning, science multiple-choice, and code generation benchmarks. We find that task identity is nearly perfectly linearly separable in the latent space, that individual agent roles are recoverable with near-perfect accuracy from hidden states alone, and that the vast majority of the representation space is effectively unused. Correctness information peaks at mid-network layers, yet this signal exists equally in single-agent baselines, revealing it as a base-model property rather than an emergent collaborative signal. Agents are highly informationally redundant with one another despite exhibiting distinct role signatures. Taken together, our results suggest that the apparent richness of latent communication reflects base-model geometry rather than learned compositional coordination.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 151
Loading