Score-Based Generative Models Detect ManifoldsDownload PDF

Published: 31 Oct 2022, Last Modified: 13 Jan 2023NeurIPS 2022 AcceptReaders: Everyone
Keywords: diffusion models, generative models, score-based generative models, stochastic differential equations, score matching, generalization, memorization
TL;DR: We provide conditions under which Score-based generative models will generate samples from the support of the true data generating distribution and identify when an SGM has memorized its training data.
Abstract: Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the "right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we provide conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.
Supplementary Material: pdf
19 Replies