Multiple-Prediction-Powered Inference

Multiple-Prediction-Powered Inference

ICLR 2026 Conference Submission22419 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prediction-powered inference, Statistical estimation, Model evaluation, LLM as judge, PPI

TL;DR: This paper introduces MultiPPI, an optimal procedure for mean estimation by combining expensive, high-quality data with cheap, lower-quality proxies.

Abstract: A core challenge in modern AI model development is obtaining high-quality evaluation metrics in a cost-effective way. Such evaluation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI), a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. We provide theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 22419

Loading