Manifold Generalization Provably Proceeds Memorization in Diffusion Models

Zebang Shen; Ya-Ping Hsieh; Niao He

Manifold Generalization Provably Proceeds Memorization in Diffusion Models

Zebang Shen, Ya-Ping Hsieh, Niao He

Published: 02 Mar 2026, Last Modified: 23 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: diffusion models, score matching, manifold hypothesis, coverage, minimax rates.

TL;DR: Generating genuinely novel samples with diffusion models can be statistically far easier than estimating the full data distribution.

Abstract: Diffusion models often generate novel samples even when the learned score is only *coarse*—a phenomenon not accounted for by the standard view of diffusion training as density estimation. In this paper, we show that, under the *manifold hypothesis*, this behavior can instead be explained by coarse scores capturing the geometry of the data while discarding the fine-scale distributional structure of the population measure $\mu_{\mathrm{data}}$. Concretely, whereas estimating the full data distribution $\mu_{\mathrm{data}}$ supported on a $k$-dimensional manifold is known to require the classical minimax rate $\widetilde{O}(N^{-1/k})$, we prove that diffusion models trained with coarse scores can exploit the regularity of the manifold support and attain a near-parametric rate toward a different target distribution. This target distribution has density uniformly comparable to that of $\mu_{\mathrm{data}}$ throughout any $\widetilde{O}(N^{-\beta/(4k)})$-neighborhood of the manifold, where $\beta$ denotes the manifold regularity. Our guarantees therefore depend only on the smoothness of the underlying support, and are especially favorable when the data density itself is irregular, for instance non-differentiable. In particular, when the manifold is sufficiently smooth, we obtain that generalization—formalized as the ability to generate novel, high-fidelity samples—occurs at a statistical rate strictly faster than that required to estimate the full population distribution $\mu_{\mathrm{data}}$.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 60

Loading