When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We analyze when diffusion probability flow converges to training data or more general manifold points, using shallow ReLU denoisers with minimal L2 norm.
Abstract: While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the data manifold. We analyze this by studying the probability flow of shallow ReLU neural network denoisers trained with minimal $\ell^2$ norm. For intuition, we introduce a simpler score flow and show that for orthogonal datasets, both flows follow similar trajectories, converging to a training point or a sum of training points. However, early stopping by the diffusion time scheduler allows probability flow to reach more general manifold points. This reflects the tendency of diffusion models to both memorize training samples and generate novel points that combine aspects of multiple samples, motivating our study of such behavior in simplified settings. We extend these results to obtuse simplex data and, through simulations in the orthogonal case, confirm that probability flow converges to a training point, a sum of training points, or a manifold point. Moreover, memorization decreases when the number of training samples grows, as fewer samples accumulate near training points.
Lay Summary: Diffusion models are a popular type of generative AI that create realistic images through a gradual process of refining random noise into structure. Although they perform remarkably well, researchers still don’t fully understand why these models are so effective. A central open question is whether diffusion models are simply memorizing training images or generating new ones by blending features from multiple examples. To explore this, we study a simplified version of a diffusion model using small neural networks. We examine how these models behave over time, observing whether they return to exact training examples or converge on new, intermediate points that mix features from several images. Our findings show that both memorization and creative generalization can occur, depending on how long the generation process is allowed to run. These insights help explain how diffusion models can produce both familiar-looking and entirely novel images, and offer a better understanding of the trade-offs in their behavior.
Primary Area: Deep Learning->Theory
Keywords: Probability flow, Score flow, Diffusion models, Memorization, Denoising, Neural networks
Submission Number: 12414
Loading