TL;DR: We accurately sample from unnormalized densities by alternating local flow steps with non-local rejection steps.
Abstract: In order to sample from an unnormalized probability density function, we propose to combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights. We relate the iterative training of CNFs with regularized velocity fields to a JKO scheme and prove convergence of the involved velocity fields to the velocity field of the Wasserstein gradient flow (WGF). The alternation of local flow steps and non-local rejection-resampling steps allows to overcome local minima or slow convergence of the WGF for multimodal distributions. Since the proposal of the rejection step is generated by the model itself, they do not suffer from common drawbacks of classical rejection schemes. The arising model can be trained iteratively, reduces the reverse Kullback-Leibler (KL) loss function in each step, allows to generate iid samples and moreover allows for evaluations of the generated underlying density. Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets and outperforms the state of the art in almost all cases significantly.
Lay Summary: We consider the problem of sampling from a probability distribution given a density function which is known up to a multiplicative constant. To this end, we propose the combination of two different steps. The first step is based on regularized continuous normalizing flows, which locally adjusts the position of the samples. In the second step, we propose a rejection-resampling scheme based on importance weights to globally move samples from over-represented modes to under-represented ones. Both steps are designed such that they allow for the evaluation of intermediate densities, a crucial part of the proposed algorithm. From a theoretical side, we link the regularized continuous normalizing flows to gradient flows in the Wasserstein space and provide some convergence analysis. Moreover, we prove that both kinds of steps reduce the Kullback-Leibler divergence to the target distribution. Finally, our paper provides numerical examples and comparisons to Markov chain Monte Carlo methods and recent neural samplers. Here, we observe that our model outperforms the comparisons significantly.
Link To Code: https://github.com/johertrich/neural_JKO_ic
Primary Area: Probabilistic Methods->Monte Carlo and Sampling Methods
Keywords: Sampling, Wasserstein Gradient Flows, Normalizing Flows, Rejection Sampling
Submission Number: 7031
Loading