Keywords: diffusion models, score denoising, neural network dynamics, high dimensional inference, simplicity bias
TL;DR: we develop a theory for score denoising in diffusion models that explains how probability distributions are learned sequentially from low order (easy) to high order statistics (hard)
Abstract: While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood.
We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a simplicity bias, learning simple, pair-wise input statistics first before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms.
Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure.
Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity and suggests that correlated latent structures may be at the core of how diffusion models are able to learn at low sample complexity.
Submission Number: 13
Loading