Prediction-Powered Adaptive Shrinkage Estimation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce Prediction-Powered Adaptive Shrinkage (PAS), a novel method to estimate multiple means using few labeled data points and black-box model predictions.
Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI’s benefits for individual statistical problems, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage ($\texttt{PAS}$), a method that bridges PPI with empirical Bayes shrinkage to improve estimation of multiple means. $\texttt{PAS}$ debiases noisy ML predictions $\textit{within}$ each task and then borrows strength $\textit{across}$ tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that $\texttt{PAS}$ adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
Lay Summary: Scientists and data analysts often face a common challenge: they have a lot of data features (like images of galaxies or product details) but only a small amount of reliable, "gold-standard" labeled information (like which galaxies have spirals or actual user ratings). This scarcity makes it hard to answer many related questions accurately, such as finding the fraction of spiral galaxies in different clusters or the average ratings for many different products. We developed a new statistical method called Prediction-Powered Adaptive Shrinkage (PAS). PAS cleverly combines these limited gold-standard labels with predictions from modern machine learning (ML) models. First, for each specific question (e.g., for one galaxy cluster), it uses the ML predictions to make initial estimates more precise. Then, it "borrows strength" across all the different questions by using these same ML predictions as a common reference point, intelligently adjusting how much to rely on them based on their estimated quality. Our method, PAS, allows researchers to get more accurate answers even when high-quality labeled data is scarce for each individual question. It automatically adapts to how good the ML predictions are, outperforming existing approaches in diverse real-world scenarios, from astronomy to analyzing customer reviews. This helps extract more reliable insights from complex datasets with many parallel questions.
Link To Code: https://github.com/listar2000/prediction-powered-adaptive-shrinkage
Primary Area: General Machine Learning
Keywords: prediction powered inference, shrinkage estimation
Submission Number: 5268
Loading