Accelerated Diffusion Models via Speculative Sampling

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Speculative Sampling for Diffusion Models via Reflection-Maximal Coupling
Abstract: Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and then accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out-of-the-box to any diffusion model. We demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations while generating exact samples from the target model. Finally, we also show how this procedure can be used to accelerate Langevin diffusions to sample unnormalized distributions.
Lay Summary: We propose an accelerated method for the sampling of diffusion models and Langevin diffusions leveraging ideas from Large Language Models.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion models, speculative sampling, maximal coupling, fast sampling of diffusion models
Submission Number: 2562
Loading