Breaking the Reversal Curse: How Masked Diffusion Models Achieve Reverse Inference

ICLR 2026 Conference Submission17945 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Discrete Diffusion model, Reversal Curse, Natural Language Processing
TL;DR: MDMs overcome the reversal curse not by their training objective but by an architectural bias in Transformer encoders, where forward and reverse attention scores are positively correlated
Abstract: The reversal curse, failing to answer "$B$ is $A$" after learning "$A$ is $B$", is a persistent pathology of autoregressive language models (ARMs). Masked diffusion based language models (MDMs), however, appear to escape this curse. A seemingly plausible explanation attributes this ability to their any-order training objective, but we show this intuition is incomplete. In particular, training to replace the mask in "$\textbf{[M]}$ is $B$" with $A$ learns the probability $p(x=A | y=B)$, which has nothing to do with the probability required to answer the reverse query, $p(y=A | x=B)$. Thus, the objective formulation alone cannot explain reversal ability. We demonstrate that the true reason lies in the architecture: in a one-layer Transformer encoder, attention scores for forward and reverse contexts are positively correlated, implicitly coupling probabilities that would otherwise be treated as unrelated. This structural bias gives MDMs a principled advantage for reverse inference. Our theory is supported by both synthetic and real-world experiments, where MDMs consistently succeed on reverse queries that cause even strong ARMs to fail.
Primary Area: interpretability and explainable AI
Submission Number: 17945
Loading