Robust Prediction-Powered Inference under Data Corruption

Mengyuan Wang; Chengde Qian; Haojie Ren; Changliang Zou

Robust Prediction-Powered Inference under Data Corruption

Mengyuan Wang, Chengde Qian, Haojie Ren, Changliang Zou

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: semi-supervised learning, prediction-powered inference, robustness

TL;DR: This paper proposes a robust prediction-powered semi-supervised statistical learning and inference framework under data corruption.

Abstract: This paper proposes a robust prediction-powered semi-supervised statistical learning and inference framework. Existing prediction-powered inference (PPI) methods use pre-trained machine-learning models to impute unlabeled samples and calibrated the imputation bias, based on the assumption of covariate homogeneity between the labeled and unlabeled datasets. However, violation of the homogeneity assumption, such as distribution shifts and data corruption, can undermine the effectiveness of semi-supervised approaches and even break down the learning process. In response, we introduce robust estimation techniques to the imputation-and-then-calibration procedure of PPI. The approach can be easily integrated with general PPI methods and improves the robustness of them against the heterogeneity and the corruption in the unlabeled set. To make full use of the labeled and unlabeled data, a cross-validation procedure is also developed for selecting the shift/contamination level. Theoretical analysis shows that our method is consistent and robust under mild conditions. Numerical simulations and real-data applications also demonstrate the robustness and superiority of the proposed method.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 25615

Loading