Keywords: semi-supervised learning, prediction-powered inference, robustness
TL;DR: This paper proposes a robust prediction-powered semi-supervised statistical learning and inference framework under data corruption.
Abstract: This paper proposes a robust prediction-powered semi-supervised statistical learning and inference framework. Existing prediction-powered inference (PPI) methods use pre-trained machine-learning models to impute unlabeled samples and calibrated the imputation bias, based on the assumption of covariate homogeneity between the labeled and unlabeled datasets. However, violation of the homogeneity assumption, such as distribution shifts and data corruption, can undermine the effectiveness of semi-supervised approaches and even break down the learning process. In response, we introduce robust estimation techniques to the imputation-and-then-calibration procedure of PPI. The approach can be easily integrated with general PPI methods and improves the robustness of them against the heterogeneity and the corruption in the unlabeled set. To make full use of the labeled and unlabeled data, a cross-validation procedure is also developed for selecting the shift/contamination level. Theoretical analysis shows that our method is consistent and robust under mild conditions. Numerical simulations and real-data applications also demonstrate the robustness and superiority of the proposed method.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 25615
Loading