Conditional Inference Mismatch in Structured Diffusion Language Models

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion language models, structured inference, conditional inference mismatch, probabilistic inference, uncertainty quantification, denoising diffusion, autoregressive models, masked language models, iterative denoising, in-context learning, adaptive corruption allocation, sequence modeling
TL;DR: Diffusion LMs train future-conditioned denoising, but many tasks require prefix-conditioned inference. We show this induces an uncertainty gap amplified by iterative denoising, and partially mitigate it with position-biased corruption.
Abstract: Structured diffusion language models are trained by denoising partially observed sequences, often with access to both past and future context. Many downstream uses, however, require prefix-conditioned inference: predict the next answer from a prompt while future tokens are latent. We study this train/test conditional mismatch as a probabilistic inference problem. The core observation is an entropy gap: future-conditioned denoising conditionals are lower-uncertainty than the causal prefix-conditioned marginal required at test time, and recovering the latter requires marginalizing over exponentially many latent futures. Iterative denoising is an approximate inference procedure for this marginalization, but can amplify uncertainty when it conditions on model generated futures. We support this with one-step conditional likelihood diagnostics, full denoising traces, and controlled synthetic tasks. Finally, we introduce a simple structured corruption prior for masked diffusion LMs: mask later positions more often, so training more frequently denoises suffix tokens from cleaner prefixes while retaining the same tractable tokenwise diffusion-ELBO form for the modified process. The intervention partially reduces the uncertainty gap and improves context-conditioned prediction, suggesting that objective structure is an important inductive bias for prefix-conditioned inference.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 267
Loading