Particles Don’t Care About Z: Towards Scaling Entropy Estimation of Unnormalized Densities

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Stein Variational Gradient Descent, Sampling, Variational Inference, Entropy
TL;DR: We propose a variational method to estimate the entropy of distributions known-up-to a normalization constant.
Abstract: Computing the differential entropy of distributions known only up to a normalization constant is a long-standing challenge with broad theoretical and practical significance. While variational inference is the most scalable approach for density approximation from samples, its potential in settings where only the unnormalized density is available remains largely underexplored. The central difficulty lies in constructing variational distributions that simultaneously ($i$) exploit the structure of the unnormalized density, ($ii$) are expressive enough to capture complex target distributions, ($iii$) remain computationally tractable, and ($iv$) support efficient sampling. Recently, \citet{messaoud2024s} introduced P-SVGD, a particle-based variational method leveraging Stein Variational Gradient Descent dynamics, which satisfies all of these constraints and demonstrates promising results in low-dimensional setups. We show, however, that P-SVGD does not scale to high dimensions due to fundamental algorithmic flaws: ($i$) misdiagnosed sensitivity to SVGD hyperparameters, ($ii$) violation of the global invertibility assumption in entropy derivation, and ($iii$) omission of a critical trace-of-Hessian term, along with sub-optimal heuristics, including a divergence-based sampling check that induces mode collapse and loose informal bounds with no practical value. These issues severely limit both correctness and scalability. We propose MET-SVGD, a principled extension of P-SVGD that addresses these flaws, providing a general framework for SVGD hyperparameters selection with global invertibilty and convergence guarantees. This enabled accurate and more scalable entropy estimation in high-dimensional set-ups. Empirically, on entropy estimation benchmarks, MET-SVGD achieves up to a 12$\times$ and 16$\times$ accuracy improvement over, respectively, P-SVGD and the most scalable baselines from the SVGD literature. On CIFAR-10 Energy-Based image generation, it improves FID by $80.4\%$ compared to P-SVGD and achieves 64$\times$ improved stability. In Maximum-Entropy reinforcement learning, MET-SVGD yields up to $16\%$ better returns than P-SVGD. We will make our code publically available: \url{https://tinyurl.com/2esyfx8j}.
Supplementary Material: pdf
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 24751
Loading