Keywords: Diffusion Language Models, Discrete Diffusion, Inverse Distillation, Efficient Inference, Post-Training, Knowledge Distillation
Abstract: Diffusion Language Models (DLMs) generate text via iterative reverse diffusion, but the resulting inference latency
limits practical use and makes inference-time methods such as guidance expensive.
We propose $\textit{Inverse-distilled Diffusion Language Models (IDLM)}$, a post-training framework that distills
a pretrained DLM into a few-step generator by extending inverse distillation to discrete token spaces.
IDLM optimizes a bilevel objective: a $\textit{fake}$ diffusion model is trained on student samples with the teacher’s diffusion loss, and the student is updated to maximize the teacher-fake loss gap on its own samples.
In discrete settings, we (i) establish identifiability by proving a uniqueness guarantee under SEDD, MDLM, and Duo objectives, and (ii) stabilize training with simplex-valued token outputs and differentiable reformulations of the diffusion losses.
As a result, experiments on multiple DLMs show that our method reduces the number of inference steps by $4 \times$-$64 \times$,
while preserving the teacher model’s entropy and generative perplexity.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 89
Loading