Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains

Yilong Qin; Andrej Risteski

Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains

Yilong Qin, Andrej Risteski

Published: 19 Jun 2023, Last Modified: 28 Jul 20231st SPIGM @ ICML PosterEveryoneRevisionsBibTeX

Keywords: theory, score matching, annealing, sample complexity

TL;DR: We show a connection between mixing times of Markov processes and statistical efficiency of score-matching losses. First analysis of statistical benefits of annealing for score matching.

Abstract: Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. EBMs). The idea is to fit the score of the distribution (i.e. $\nabla_x \log p(x)$), rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost" can be steep: recent work by Koehler et al '23 showed that for distributions that have poor isoperimetric properties (a large Poincare or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians---even in one dimension---have a poor Poincare constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and a generalized score matching loss that tries to fit $\frac{\mathcal{O}p}{p}$. We instantiate this framework with several examples. In the special case of $\mathcal{O} = \nabla_x$, and $\mathcal{L}$ being the generator of Langevin diffusion, this generalizes and recovers the results from Koehler et al '23. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song-Ermon '19. Moreover, we show that if the distribution being learned is a mixture of $K$ Gaussians in $d$ dimensions, the sample complexity of annealed score matching is polynomial in $d$ and $K$ --- obviating the Poincar'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. This is the first result characterizing the benefits of annealing for score matching---a crucial component in more sophisticated score-based approaches like Song-Ermon '19.

Submission Number: 39

Loading