THE ILLUSION OF LATENT GENERALIZATION: BI- DIRECTIONALITY AND THE REVERSAL CURSE

Julian Coda-Forno; Jane X Wang; Arslan Chaudhry

THE ILLUSION OF LATENT GENERALIZATION: BI- DIRECTIONALITY AND THE REVERSAL CURSE

Julian Coda-Forno, Jane X Wang, Arslan Chaudhry

Published: 02 Mar 2026, Last Modified: 14 May 2026ICLR 2026 Re-Align WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 5 pages)

Domain: machine learning

Abstract: The Reversal Curse describes a failure of autoregressive language models to re- trieve a fact in reverse order (e.g., training on “A > B” but failing on “B < A”). Recent work shows that objectives with bidirectional supervision (e.g., bidirec- tional attention or masking-based reconstruction for decoder-only models) can mitigate the reversal curse. We extend this evaluation to include a vanilla masked language modeling (MLM) objective and compare it to decoder-only masking- based training across four reversal benchmarks and then provide a minimal mech- anistic study of how these objectives succeed. We show that reversal accuracy requires training signal that explicitly makes the source entity a prediction target, and we find little evidence that success corresponds to a single direction-agnostic representation of a fact. Instead, representation distances and linear probes are consistent with storing forward and reverse directions as distinct entries, with dif- ferent indexing geometry for MLM versus decoder-only masking-based training. Our results caution that objective-level “fixes” can improve reversal behavior with- out necessarily inducing the kind of latent generalization one might expect from a unified concept.

Presenter: ~Julian_Coda-Forno1

Submission Number: 38

Loading