The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications
Abstract: Score-based models have achieved remarkable results in the generative modeling of multiple domains. By learning the gradient of smoothed data distributions, they can iteratively generate samples from complex distributions, e.g., natural images.
The learned score function enables their generalization capability, but its structure and relation to the underlying data manifold remain largely unclear.
Here, we aim to identify such structures through a normative analysis of diffusion models with the exact score of tractable distributions, e.g. Gaussian and Gaussian mixture.
We find, diffusion model with Gaussian score admits a closed-form solution, which predicts many qualitative aspects of sample generation dynamics.
Further, we claim that, for high noise scales, the learned neural score is dominated by the linear score of the Gaussian data approximation; for lower noise scales, the learned neural score is more similar to the score of a coarse-grained approximation of data, e.g. Gaussian mixture.
We supply theoretical arguments for this claim and empirically show that the Gaussian approximation is accurate for a surprisingly wide range of noise in practical diffusion models.
We further study the score learning dynamics and find that diffusion models learn the simpler Gaussian score preferentially.
Our findings enable us to precisely predict the initial diffusion trajectory using the Gaussian analytical solution and we can accelerate image sampling 15-30\% by skipping the initial phase while maintaining image quality (with a near state-of-the-art FID score of 1.93 on CIFAR-10 unconditional generation). Our findings strengthen the field's theoretical understanding of how diffusion models work and suggest ways to improve the design and training of diffusion models.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Sungwoong_Kim2
Submission Number: 2928
Loading