Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have emerged as powerful tools for generative modeling. However, their sequential computation requirements lead to significant inference-time bottlenecks. In this work, we utilize the connection between DDPMs and Stochastic Localization to prove that, under an appropriate reparametrization, the increments of DDPM satisfy an exchangeability property. This general insight enables near-black-box adaptation of various performance optimization techniques from autoregressive models to the diffusion setting. To demonstrate this, we introduce _Autospeculative Decoding_ (ASD), an extension of the widely used speculative decoding algorithm to DDPMs that does not require any auxiliary draft models. Our theoretical analysis shows that ASD achieves a $\tilde{O}(K^{\frac{1}{3}})$ parallel runtime speedup over the $K$ step sequential DDPM. We also demonstrate that a practical implementation of autospeculative decoding accelerates DDPM inference significantly in various domains.
Lay Summary: Generative AI tools called Denoising Diffusion Probabilistic Models (DDPMs) are excellent for tasks like creative new images or videos, but they're often very slow because they generate content step-by-step. Our research discovers a hidden property of DDPMs: the order of some internal calculation steps in these DDPMs can actually be rearranged without changing the final result. We use this insight to develop a new method called Autospeculative Decoding (ASD). ASD cleverly allows many of these steps to be performed simultaneously, or in parallel. We mathematically prove that this makes DDPMs significantly faster at generating content, without needing any added components. We complement our proofs with practical demonstrations to show that this leads to 2-4x speedups across various applications.
Primary Area: Theory->Probabilistic Methods
Keywords: DDPM, Stochastic Localization, speculative decoding
Submission Number: 15015
Loading