Fast Language Generation through Discrete Diffusion Divergence Instruct

Fast Language Generation through Discrete Diffusion Divergence Instruct

ICLR 2026 Conference Submission19443 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: discrete diffusion models, masked diffusion models, distillation, integral KL divergence, large language models, generative modeling

TL;DR: We distill efficient few-step discrete diffusion language model, achieving fast inference and lower perplexity than state-of-the-art baselines.

Abstract: The fast generation of language texts is the holy grail that people pursue in the AI era. In this work, we introduced **Di**screte **Di**ffusion Divergence **Instruct** (**DiDiInstruct**), a training-based method that leads to fast language generation models by initializing from a pre-trained (masked) discrete diffusion language model (dLLM). The resulting DiDi-Instruct model outperforms the dLLM counterparts and the GPT-2 baseline with 64$\times$ acceleration. In the theoretical part of the paper, we build the foundation of DiDi-Instruct in a framework of integral KL divergence minimization, with practical training algorithms. We also introduce techniques like grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler (RGAS) that significantly improve the training stability, the model coverage, and the inference performances. On OpenWebText, DiDi-Instruct outperforms all accelerated language generation models as well as the GPT-2 baseline and the standard dLLMs, achieving sample perplexities ranging from 62.2 (8 NFEs) to 18.4 (128 NFEs). These performance gains are accomplished with a negligible entropy loss of about $1$\% and $20\times$ less additional training wall-clock time. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release our code and models along with the paper

Supplementary Material: zip

Primary Area: generative models

Submission Number: 19443

Loading