Objective-Induced Conditional Mismatch in Sequence Diffusion Models

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion language models, masked diffusion, in-context learning, conditional mismatch, future marginalisation, representation geometry, context compression, objective design, causal conditioning, denoising objectives, mechanistic interpretability, sequence modelling
Abstract: The choice of training objective shapes the geometry of the learned representation space by determining which conditional dependencies are consistently reinforced during optimisation. We study how causal (AR), symmetric diffusion (MDLM), and position-biased corruption objectives reshape the learned residual geometry and the associated mechanistic circuits, using a combination of loss- curvature inspection and circuit-level probes. We formalise a future marginalisation barrier: non-causal denoising objectives optimise lower- entropy future-conditioned distributions, while prefix-conditioned inference requires marginalising over latent futures. We find that this mismatch is associated with more isotropic residual geometry, weaker directional OV circuits, and reduced context compression. We introduce a position-biased corruption prior that masks later positions more frequently, encouraging suffix prediction from cleaner prefixes while preserving the tractable tokenwise diffusion- ELBO. This partially restores directional representation structure and improves context-conditioned prediction in controlled diffusion LM settings. Our results suggest that objective structure is an important determinant of representation geometry and of prefix-conditioned inference behaviour in sequence diffusion models.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 208
Loading