Early-stopping Too Late? Traces of Memorization Before Overfitting in Generative Diffusion

Published: 11 Jun 2025, Last Modified: 13 Jul 2025MemFMEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: Generative diffusion, memorization, generalization, early-stopping, overfitting
Abstract: In generative diffusion, early stopping is widely adopted as a criterion for minimizing the distance between the generated and target distributions. Yet, this benefit comes with no explicit guarantee against memorization. In this work, we study the distributional fidelity of denoising diffusion probabilistic models in a controlled setup, a hierarchical data model with tractable scores and marginal probabilities. Tracking the generative behavior through training, we identify a *biased generalization* phase preceding the minimum of the test loss, where the model increasingly favors samples with anomalously high overlap to training data, without yet reproducing them exactly. Our results highlight a subtle failure mode of diffusion training dynamics, suggesting that standard early stopping might be insufficient to prevent distorted generalization, well before the emergence of overt memorization.
Submission Number: 10
Loading