Keywords: Discrete Diffusion model, Reversal Curse, Natural Language Processing
TL;DR: MDMs overcome the reversal curse not by their training objective but by an architectural bias in Transformer encoders, where forward and reverse attention scores are positively correlated
Abstract: The reversal curse, failing to answer "$B$ is $A$" after learning "$A$ is $B$",
is a persistent pathology of autoregressive language models (ARMs).
Masked diffusion based language models (MDMs), however, appear to escape this curse.
A seemingly plausible explanation attributes this ability to their any-order training objective,
but we show this intuition is incomplete.
In particular, training to replace the mask in "$\textbf{[M]}$ is $B$" with $A$ learns the probability $p(x=A | y=B)$,
which has nothing to do with the probability required to answer the reverse query, $p(y=A | x=B)$.
Thus, the objective formulation alone cannot explain reversal ability.
We demonstrate that the true reason lies in the architecture: in a one-layer Transformer encoder,
attention scores for forward and reverse contexts are positively correlated,
implicitly coupling probabilities that would otherwise be treated as unrelated.
This structural bias gives MDMs a principled advantage for reverse inference.
Our theory is supported by both synthetic and real-world experiments,
where MDMs consistently succeed on reverse queries that cause even strong ARMs to fail.
Primary Area: interpretability and explainable AI
Submission Number: 17945
Loading