The Diffusion Duality

Published: 06 Mar 2025, Last Modified: 14 Mar 2025ICLR 2025 DeLTa Workshop OralEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Diffusion Models, Language Models, Discrete Diffusion
TL;DR: We demonstrate that discrete diffusion emerges from Gaussian diffusion which allows adapt techniques from the later to improve the former.
Abstract: Discrete diffusions models have been demonstrated to be surprisingly strong language models. In this work, we show that discrete diffusion language models can be further improved by adapting methods from continuous-state diffusion models. We establish a core property of uniform state diffusion: it stems from an underlying Gaussian diffusion process. This property allows us to improve both training by utilizing a curriculum learning strategy that reduces training variance and leads to $\mathbf{2\times}$ faster convergence, as well as sampling by adapting efficient distillation methods from continuous-state diffusion models. As a result, models surpass an autoregressive model’s zero-shot perplexity on 3 out of 7 benchmarks and we manage to reduce the sampling steps by **two orders of magnitude** while preserving sample quality.
Submission Number: 109
Loading