Keywords: benign overfitting, double descent, diffusion models, generalization, deep learning theory, random feature
TL;DR: We show that benign overfitting is not possible in diffusion models training.
Abstract: Benign overfitting and double descent have come to shape our understanding of generalization in deep learning, painting a consistent picture: overfitting is not only compatible with good generalization but can actively benefit it. Since diffusion models share much of the machinery of standard deep learning, it is natural to assume that they also exhibit these properties. In this work, we show that this assumption is largely incorrect. We first establish fundamental impossibility results, showing that overfitting and good generalization cannot occur simultaneously except when the sample size grows exponentially with data dimension. We then identify a key difference between regression and score matching in a simplified setting: regression benefits from an alignment between the kernel of the empirical covariance and the target, whereas no such alignment exists in score matching, making overfitting irreparably harmful. We further examine mechanisms that prevent overfitting, identifying implicit regularization arising from time-smoothness in the score function, and early stopping in training. We support our theoretical findings with high-dimensional experiments on U-Net architectures in image generation settings. Our results reveal that generalization is governed by mechanisms distinct from those of classical settings, motivating new theory for diffusion models.
Submission Number: 222
Loading