Keywords: Memorization, Local coverage, Diffusion
TL;DR: We show that memorization is governed not only by the total number of training samples, but by their local coverage.
Abstract: Diffusion models are known to memorize training data, but which samples are most likely to be memorized? While memorization is often treated as a global property, in practice diffusion models simultaneously generate both memorized and novel samples.
In this work, we show that memorization is governed by local data coverage. Leveraging the connection between diffusion models and kernel density estimation (KDE), we derive a theoretical criterion that predicts whether a point is memorized or generalized based on the density of training data in its neighborhood and the overall sample complexity. In the high-dimensional limit, this leads to a sharp local transition: regions of low coverage are dominated by isolated training samples (memorization), while dense regions support interpolation and generalization.
We validate these predictions empirically, showing that memorization increases with local sparsity and that diffusion models exhibit a coexistence of memorized and novel samples within the same model. Extending this framework to multi-class settings, we further show that classes with higher intra-class diversity (and thus lower local coverage) are more strongly memorized.
Our results provide a unified, local view of memorization in diffusion models, explaining when and where memorization occurs in terms of data geometry.
Submission Number: 92
Loading