Track: tiny / short paper (up to 5 pages)
Domain: machine learning
Abstract: The Reversal Curse describes a failure of autoregressive language models to re-
trieve a fact in reverse order (e.g., training on “A > B” but failing on “B < A”).
Recent work shows that objectives with bidirectional supervision (e.g., bidirec-
tional attention or masking-based reconstruction for decoder-only models) can
mitigate the reversal curse. We extend this evaluation to include a vanilla masked
language modeling (MLM) objective and compare it to decoder-only masking-
based training across four reversal benchmarks and then provide a minimal mech-
anistic study of how these objectives succeed. We show that reversal accuracy
requires training signal that explicitly makes the source entity a prediction target,
and we find little evidence that success corresponds to a single direction-agnostic
representation of a fact. Instead, representation distances and linear probes are
consistent with storing forward and reverse directions as distinct entries, with dif-
ferent indexing geometry for MLM versus decoder-only masking-based training.
Our results caution that objective-level “fixes” can improve reversal behavior with-
out necessarily inducing the kind of latent generalization one might expect from a
unified concept.
Presenter: ~Julian_Coda-Forno1
Submission Number: 38
Loading