TL;DR: Enabling mode-seeking and convergance in distillation of diffusion models, addressing the low-fidelity and blurry results of Score Distillation Sampling (SDS).
Abstract: We present *mean-shift distillation*, a novel diffusion distillation technique that provides a provably good proxy for the gradient of the diffusion output distribution. This is derived directly from mean-shift mode seeking on the distribution, and we show that its extrema are aligned with the modes. We further derive an efficient product distribution sampling procedure to evaluate the gradient.
Our method is formulated as a drop-in replacement for score distillation sampling (SDS), requiring neither model retraining nor extensive modification of the sampling procedure. We show that it exhibits superior mode alignment as well as improved convergence in both synthetic and practical setups, yielding higher-fidelity results when applied to both text-to-image and text-to-3D applications with Stable Diffusion.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: diffusion models, score distillation sampling, meanshift, mode-seeking, classifier-free guidance
Submission Number: 797
Loading