Keywords: Diffusion, Wavelets, Interpretability
TL;DR: An analytically solvable wavelet-based score parameterization expresses diffusion scores in terms of data moments, revealing which correlations matter across noise levels.
Abstract: Diffusion models have had remarkable success over the last decade in generating a diverse set of visually plausible images. These models work by transforming the data to a centered Gaussian and then learning the reverse process by training a neural network to approximate the score of the underlying distribution. A variety of architectures from CNNs, to U-Nets, to transformers have been used as the score-approximation network in diffusion modeling. We propose an analytically solvable parameterization of the score function using an expansion in a wavelet basis. In particular, we derive interpretable optimal score functions in a 2D, orthogonal wavelet basis in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.
Submission Number: 126
Loading