Accelerating Diffusion Language Models via Inverse Distillation

Published: 02 Mar 2026, Last Modified: 02 Mar 2026ReALM-GEN 2026 - ICLR 2026 WorkshopEveryoneRevisionsCC BY 4.0
Keywords: Diffusion Language Models, Discrete Diffusion, Inverse Distillation, Efficient Inference, Post-Training, Knowledge Distillation
Abstract: Diffusion Language Models (DLMs) generate text via iterative reverse diffusion, but the resulting inference latency limits practical use and makes inference-time methods such as guidance expensive. We propose $\textit{Inverse-distilled Diffusion Language Models (IDLM)}$, a post-training framework that distills a pretrained DLM into a few-step generator by extending inverse distillation to discrete token spaces. IDLM optimizes a bilevel objective: a $\textit{fake}$ diffusion model is trained on student samples with the teacher’s diffusion loss, and the student is updated to maximize the teacher-fake loss gap on its own samples. In discrete settings, we (i) establish identifiability by proving a uniqueness guarantee under SEDD, MDLM, and Duo objectives, and (ii) stabilize training with simplex-valued token outputs and differentiable reformulations of the diffusion losses. As a result, experiments on multiple DLMs show that our method reduces the number of inference steps by $4 \times$-$64 \times$, while preserving the teacher model’s entropy and generative perplexity.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 89
Loading