Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage

Importance Weighted Score Matching for Diffusion Samplers with Enhanced Mode Coverage

ICLR 2026 Conference Submission205 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural Sampler, Importance Sampling

TL;DR: We correct the distribution mismatch in data-free neural sampling through importance weighting, achieving SOTA results on complex multi-modal distributions.

Abstract: Training neural samplers from unnormalized densities without target samples is challenging, particularly for achieving comprehensive mode coverage. In data-free scenarios, a fundamental discrepancy arises: while the training objective requires expectations over the target distribution, we can only sample from the model-induced distribution. Previous methods ignore this mismatch, resulting in mode-seeking objectives similar to reverse KL divergence. While recent approaches like replay buffers provide heuristic mitigation, they lack a principled correction for this distribution mismatch. In this work, we propose \textit{Importance Weighted Score Matching}, a principled training approach for diffusion-based samplers that optimizes a mode-covering objective analogous to the forward KL divergence by re-weighting the score matching loss with tractable importance sampling estimates, thereby overcoming the absence of target distribution data. We also provide theoretical analysis of the bias and variance for the proposed estimator and the practical loss function used in our method. Experiments on increasingly complex multi-modal distributions, including 2D Gaussian Mixture Models with up to 120 modes and challenging particle systems with inherent symmetries, demonstrating that our approach consistently outperforms existing neural samplers across all distributional distance metrics, achieving state-of-the-art results on all benchmarks.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 205

Loading