Keywords: denoising diffusions, computational bottlenecks, information-computation gap, spiked Wigner model, score matching
TL;DR: We investigate failure of sampling in diffusions for high-dimensional distributions with an information-computation gap.
Abstract: Denoising diffusions sample from a probability distribution $\mu$ in $\mathbb{R}^d$ by constructing a stochastic process $(\hat{\mathbf{x}}_t:t\ge 0)$ in $\mathbb{R}^d$ such that $\hat{\mathbf{x}}_0$ is easy to sample, but the distribution of $\hat{\mathbf{x}}_T$ at large $T$ approximates $\mu$. The drift $\mathbf{m}:\mathbb{R}^{d}\times\mathbb{R}\to\mathbb{R}^d$ of this diffusion process is learned by minimizing a score-matching objective.
Is every probability distribution $\mu$, for which sampling is tractable, also amenable to sampling via diffusions? We address this question by studying its relation to information-computation gaps in statistical estimation. Earlier work in this area constructs broad families of distributions $\mu$ for which sampling is easy, but approximating the drift $\mathbf{m}(\mathbf{y},t)$ is conjectured to be intractable, and provides rigorous evidence for intractability.
We prove that this implies a failure of sampling via diffusions. First, there exist drifts whose score matching objective is superpolynomially close to the optimum value (among polynomial time drifts) and yet yield samples with distribution that is very far from the target one. Second, any polynomial-time drift that is also Lipschitz continuous results in equally incorrect sampling.
We instantiate our results on the toy problem of sampling a sparse low-rank matrix, and further demonstrate empirically the failure of diffusion-based sampling. Our work implies that caution should be used in adopting diffusion sampling when other approaches are available.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 22089
Loading