Not All Who Wander Are Lost: Hallucinations as Neutral Dynamics in Residual Transformers

Not All Who Wander Are Lost: Hallucinations as Neutral Dynamics in Residual Transformers

ICLR 2026 Conference Submission25471 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformer architectures, Mean-field Games, Hallucinations, Stability and Dynamics

Abstract: We separate onset from persistence and prove that persistence follows from the neutral dynamics of pre-LayerNorm residual transformers. Exact operator norms for LayerNorm, residual blocks, and the softmax decoder yield conservative upper bounds showing the absence of contractive or expansive bias at the decoded level. These bounds are sharpened by working with corridor constants that remain explicit and falsifiable. For open probes, drift decomposes into a predictable component bounded by the sharpened corridor and a centered martingale component controlled by concentration and central limit arguments. Neutrality is then lifted from paired rollouts to populations by casting trajectories or blocks as exchangeable agents in a mean-field game, yielding a population-invariant stable under depth and width scaling. Predictions are tested with controlled randomization audits up to GPT2-large: closed probes are centered and behave as bounded martingale differences, while open probe drift stays within the predicted corridor with magnitudes consistent with the sharper constants. Together, these theoretical and empirical results provide the first structural account of persistence, explaining why hallucinations persist across model scales without re-auditing hundreds of millions of parameters, and showing that interventions, which do not alter the residual backbone, cannot eliminate it once onset has occurred.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 25471

Loading