CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates

Tianyu Chen; Vansh Bansal; James G. Scott

CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates

Tianyu Chen, Vansh Bansal, James G. Scott

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Statistical Hypothesis Testing; Neural Posterior Estimation; Likelihood-Free Inference; Bayesian Methods; Simulation-Based Inference

TL;DR: We propose a new method to test equality between the true and estimated posterior distributions, establishing necessary and sufficient conditions for distributional equivalence , with both theoretical guarantees and practical scalability.

Abstract: We consider the problem of validating whether a neural posterior estimate $q(\theta \mid x)$ is an accurate approximation to the true, unknown true posterior $p(\theta \mid x)$. Existing methods for evaluating the quality of an NPE estimate are largely derived from classifier-based tests or divergence measures, but these suffer from several practical drawbacks. As an alternative, we introduce the *Conditional Localization Test* (**CoLT**), a principled method designed to detect discrepancies between $p(\theta \mid x)$ and $q(\theta \mid x)$ across the full range of conditioning inputs. Rather than relying on exhaustive comparisons or density estimation at every $x$, CoLT learns a localization function that adaptively selects points $\theta_l(x)$ where the neural posterior $q$ deviates most strongly from the true posterior $p$ for that $x$. This approach is particularly advantageous in typical simulation-based inference settings, where only a single draw $\theta \sim p(\theta \mid x)$ from the true posterior is observed for each conditioning input, but where the neural posterior $q(\theta \mid x)$ can be sampled an arbitrary number of times. Our theoretical results establish necessary and sufficient conditions for assessing distributional equality across all $x$, offering both rigorous guarantees and practical scalability. Empirically, we demonstrate that CoLT not only performs better than existing methods at comparing $p$ and $q$, but also pinpoints regions of significant divergence, providing actionable insights for model refinement. These properties position CoLT as a state-of-the-art solution for validating neural posterior estimates.

Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)

Submission Number: 11114

Loading