Noise Conditional Variational Score Distillation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a scalable training method to distill a pretrained diffusion models into generative denoisers.
Abstract: We propose Noise Conditional Variational Score Distillation (NCVSD), a novel method for distilling pretrained diffusion models into generative denoisers. We achieve this by revealing that the unconditional score function implicitly characterizes the score function of denoising posterior distributions. By integrating this insight into the Variational Score Distillation (VSD) framework, we enable scalable learning of generative denoisers capable of approximating samples from the denoising posterior distribution across a wide range of noise levels. The proposed generative denoisers exhibit desirable properties that allow fast generation while preserve the benefit of iterative refinement: (1) fast one-step generation through sampling from pure Gaussian noise at high noise levels; (2) improved sample quality by scaling the test-time compute with multi-step sampling; and (3) zero-shot probabilistic inference for flexible and controllable sampling. We evaluate NCVSD through extensive experiments, including class-conditional image generation and inverse problem solving. By scaling the test-time compute, our method outperforms teacher diffusion models and is on par with consistency models of larger sizes. Additionally, with significantly fewer NFEs than diffusion-based methods, we achieve record-breaking LPIPS on inverse problems.
Lay Summary: Generating realistic images and solving image-related problems with artificial intelligence often uses "diffusion models." These models are powerful but can be slow, requiring many steps to create a clear image from noise. While some methods speed this up, they often lose the ability to refine images, impacting quality and flexibility. Our paper presents Noise Conditional Variational Score Distillation (NCVSD) to make these models faster and more versatile. NCVSD efficiently condenses large diffusion models into smaller, quicker "generative denoisers." This works by recognizing that the model's core mathematical function implicitly guides how to clean up noisy images. NCVSD offers several advantages: it generates high-quality images in a single step, significantly speeding up the process. Unlike other fast methods, it still allows for multiple refinement steps to improve quality, giving users control over speed versus precision. Additionally, NCVSD is flexible, enabling tasks like deblurring or completing images without specific retraining. Experiments show NCVSD performs on par with or better than existing models, with reduced computational steps.
Link To Code: https://github.com/xypeng9903/ncvsd
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: generative models, diffusion distillation, inverse problems
Submission Number: 2809
Loading