Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful family of generative models that, yielding high-fidelity samples and competitive log-likelihoods across a range of domains, including image and speech synthesis. Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models. However, DDPMs typically require hundreds-to-thousands of steps to generate a high fidelity sample, making them prohibitively expensive for high dimensional problems. Fortunately, DDPMs allow trading generation speed for sample quality through adjusting the number of refinement steps during inference. Prior work has been successful in improving generation speed through handcrafting the time schedule through trial and error. In our work, we view the selection of the inference time schedules as an optimization problem, and introduce an exact dynamic programming algorithm that finds the log-likelihood-optimal discrete time schedules for any pre-trained DDPM. Our method exploits the fact that the evidence lower bound (ELBO) can be decomposed into separate KL divergence terms, and given any computation budget, we discover the time schedule that maximizes the training ELBO exactly. Our method is efficient, has no hyper-parameters of its own, and can be applied to any pre-trained DDPM with no retraining. We discover inference time schedules requiring as few as 32 refinement steps, while sacrificing less than 0.1 bits per dimension compared to the default 4,000 steps used on an ImageNet 64x64 model.
One-sentence Summary: We present a simple procedure that discovers log-likelihood-optimal strides for score-based generative models.
15 Replies
Loading