Diffusion-based Annealed Boltzmann Generators : benefits, pitfalls and hopes

TMLR Paper7222 Authors

28 Jan 2026 (modified: 06 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sampling configurations at thermodynamic equilibrium is a central challenge in statistical physics. Boltzmann Generators (BGs) address this problem by pairing a generative model with a Monte Carlo (MC) correction scheme, yielding asymptotically consistent samples from an unnormalized target density. However, most existing BGs rely on classic MC mechanisms such as importance sampling, which (i) impose strong constraints on the backbone model (typically requiring exact and efficient likelihood evaluation) and (ii) suffer from severe scalability issues in high-dimensional, multi-modal settings. This work investigates BGs built around annealed Monte Carlo (aMC) schemes, which mitigate the limitations of classic MC by bridging a simple reference distribution to the target through a sequence of intermediate densities. In this context, diffusion models (DMs) are particularly appealing backbones: they are powerful generative models and naturally induce density paths that have been leveraged in prior aMC-based methods. We provide an empirical meta-analysis of this DM-based aMC-BG design choice on controlled yet challenging synthetic benchmarks based on multi-modal Gaussian mixtures, varying inter-mode separation, number of modes, and dimensionality. To disentangle learning effects from inference effects, we first study an idealized setting in which the DM is perfectly learned, and then turn to realistic settings where the DM is trained from data. Even in the idealized regime, we find that standard aMC integrations of DMs that rely only on first-order stochastic denoising kernels systematically fail in the proposed scenarios. In contrast, incorporating second-order denoising kernels can substantially improve performance when the required covariance information is available. Motivated by this gap, we propose an alternative aMC integration based on deterministic first-order transport maps derived from DMs; empirically, this approach consistently outperforms its stochastic first-order counterpart, albeit at increased computational cost. Overall, while results in the perfect-learning regime suggest that exploiting DM-induced dynamics within aMC is a promising route to building effective BGs, our experiments with learned DMs show that DM–aMC combinations still struggle to produce accurate BGs in practice. We attribute this limitation primarily to inaccuracies in DM log-density estimation.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Julius_Berner1
Submission Number: 7222
Loading