Keywords: Diffusion Samplers, Diffusion Bridges, Bayesian Learning, Diffusion Models
TL;DR: We show that the theoretical rationale for the LV loss is questionable in diffusion bridges, advocating reverse Kullback-Leibler divergence with the log derivative trick. Additionally, we introduce exploration techniques to alleviate mode-collapse.
Abstract: Diffusion bridges are a promising class of deep-learning methods for sampling from unnormalized distributions. Recent works show that the Log Variance (LV) loss consistently outperforms the reverse Kullback-Leibler (rKL) loss when using the reparametrization trick to compute rKL-gradients. While the LV loss is theoretically justified for diffusion samplers with non-learnable forward processes—yielding identical gradients to the rKL loss combined with the log derivative trick—this equivalence does not hold for diffusion bridges.
We point out that the LV loss does not unconditionally satisfy the data processing inequality, casting doubt on its suitability for diffusion bridges. To avoid this problem we employ the rKL loss with the log derivative trick and show that it consistently outperforms the LV loss. Furthermore, we introduce two techniques for controlling the exploration-exploitation trade-off in diffusion samplers—one based on variational annealing and the other on off-policy exploration. We validate their effectiveness on highly multimodal benchmark tasks.
Submission Number: 62
Loading