Keywords: diffusion language models, masked diffusion, in-context learning, conditional mismatch, future marginalisation, representation geometry, context compression, objective design, causal conditioning, denoising objectives, mechanistic interpretability, sequence modelling
Abstract: The choice of training objective shapes the geometry of the learned representation space by determining which conditional dependencies are consistently reinforced during optimisation. We study
how causal (AR), symmetric diffusion (MDLM),
and position-biased corruption objectives reshape
the learned residual geometry and the associated
mechanistic circuits, using a combination of loss-
curvature inspection and circuit-level probes.
We formalise a future marginalisation barrier:
non-causal denoising objectives optimise lower-
entropy future-conditioned distributions, while
prefix-conditioned inference requires marginalising over latent futures. We find that this mismatch
is associated with more isotropic residual geometry, weaker directional OV circuits, and reduced
context compression.
We introduce a position-biased corruption prior
that masks later positions more frequently, encouraging suffix prediction from cleaner prefixes
while preserving the tractable tokenwise diffusion-
ELBO. This partially restores directional representation structure and improves context-conditioned
prediction in controlled diffusion LM settings.
Our results suggest that objective structure is an
important determinant of representation geometry
and of prefix-conditioned inference behaviour in
sequence diffusion models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 208
Loading