Distilled Diffusion Language Models

ICLR 2025 Conference Submission580 Authors

13 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: diffusion language models, discrete diffusion, distillation
TL;DR: Distilling a pre-trained autoregressive language model into a diffusion-based language model with proposed Target Concrete Score objective.
Abstract: Transformer-based Large Language Models (LLMs) have demonstrated remarkable capa- bilities, yet their autoregressive nature forces sequential token-by-token decoding, leading to inefficiencies during inference. Furthermore, autoregressive language models lack in- herent self-correction abilities, which hinders their capacity to refine and improve gener- ated content without relying on external prompting or retraining techniques. In contrast, diffusion-based models offer the advantage of fast parallel generation through iterative refinement, while leveraging bi-directional attention to utilize full context at once. How- ever, diffusion models are unable to match their autoregressive counterparts. This moti- vates us to explore the possibility of distilling a pre-trained autoregressive (AR) language model (teacher) into a non-autoregressive diffusion (non-AR) language model (student), combining the best of both worlds. In this work, we present Target Concrete Score (TCS) distillation, a theoretically grounded framework that bridges autoregressive and diffusion paradigms. TCS distillation is broadly applicable to both discrete and continuous diffu- sion models, with any pre-trained autoregressive teacher model. We propose techniques to make TCS distillation scalable and efficient for transformer-based models, and show how it can both improve pre-trained diffusion language models and also train new mod- els from scratch. Through comprehensive experiments on language modeling tasks, we demonstrate the effectiveness of our proposed methods.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 580
Loading