The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications

Published: 04 Dec 2024, Last Modified: 04 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models have achieved remarkable results in multiple domains of generative modeling. By learning the gradient of smoothed data distributions, they can iteratively generate samples from complex distributions, e.g., of natural images. The learned score function enables their generalization capabilities, but how the learned score relates to the score of the underlying data manifold remains largely unclear. Here, we aim to elucidate this relationship by comparing the learned scores of neural-network-based models to the scores of two kinds of analytically tractable distributions: Gaussians and Gaussian mixtures. The simplicity of the Gaussian model makes it particularly attractive from a theoretical point of view, and we show that it admits a closed-form solution and predicts many qualitative aspects of sample generation dynamics. We claim that the learned neural score is dominated by its linear (Gaussian) approximation for moderate to high noise scales, and supply both theoretical and empirical arguments to support this claim. Moreover, the Gaussian approximation empirically works for a larger range of noise scales than naive theory suggests it should, and is preferentially learned by networks early in training. At smaller noise scales, we observe that learned scores are better described by a coarse-grained (Gaussian mixture) approximation of training data than by the score of the training distribution, a finding consistent with generalization. Our findings enable us to precisely predict the initial phase of trained models' sampling trajectories through their Gaussian approximations. We show that this allows one to leverage the Gaussian analytical solution to skip the first 15-30\% of sampling steps while maintaining high sample quality (with a near state-of-the-art FID score of 1.93 on CIFAR-10 unconditional generation). This forms the foundation of a novel hybrid sampling method, termed \textit{analytical teleportation}, which can seamlessly integrate with and accelerate existing samplers, including DPM-Solver-v3 and UniPC. Our findings strengthen the field's theoretical understanding of how diffusion models work and suggest ways to improve the design and training of diffusion models.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: ### Major change * We added figures, results, and additional methods regarding the new sampler benchmark experiments involving DPMSolver++, DPMSolver v3, Heun, and UniPC samplers. Specifically, * we added the paragraph `Experiment 2` in Section 6; * we added main Figures 15 and Supplementary Figure 31, 32,33 in Section B.9; * we added descriptions of the method in A.7. ### Minor change * We fixed some formatting issues in Table 3. * We fixed typos and polished the main text.
Code: https://github.com/Animadversio/GaussianTeleportationDiffusion
Assigned Action Editor: ~Sungwoong_Kim2
Submission Number: 2928
Loading