Measuring and Reducing Train--Inference Mismatch in Discrete Diffusion Language Models

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: discrete diffusion language models, train-inference mismatch, sampling diagnostics, occupancy drift, text generation, nucleus sampling, inference-time control, classifier two-sample tests
TL;DR: We introduce Mirror Gap, a process-level diagnostic for train–inference occupancy drift in discrete diffusion language models, and MirrorGate, an online sampler that uses this signal to improve the quality–compute frontier.
Abstract: Discrete diffusion language models are trained on states sampled from a forward corruption process, but they generate by following states induced by a reverse sampler driven by the learned denoiser. This creates a train–inference mismatch: the denoiser may be queried on states unlike those seen during training. We formalize this mismatch as the Mirror Gap, a time-indexed discrepancy between the forward marginal at each denoising time and the sampler-induced reverse marginal. We estimate projected versions of this gap using lightweight classifiers on frozen denoiser hidden states. The resulting signal is both predictive and actionable: early scores predict final sample quality, and using the same signal online substantially improves the quality–compute frontier. Notably, once the sampling trajectory is controlled directly, common token-level heuristics such as top-$p$ sampling can become unnecessary or even counterproductive. These findings recast diffusion text degeneration as trajectory-level drift, and show that this drift can be measured and reduced.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 189
Loading