TL;DR: This paper proposes a general detect-then-impute conformal prediction framework to address the cellwise outliers in test data.
Abstract: Conformal prediction is a powerful tool for constructing prediction intervals for black-box models, providing a finite sample coverage guarantee for exchangeable data. However, this exchangeability is compromised when some entries of the test feature are contaminated, such as in the case of cellwise outliers. To address this issue, this paper introduces a novel framework called *detect-then-impute conformal prediction*. This framework first employs an outlier detection procedure on the test feature and then utilizes an imputation method to fill in those cells identified as outliers. To quantify the uncertainty in the processed test feature, we adaptively apply the detection and imputation procedures to the calibration set, thereby constructing exchangeable features for the conformal prediction interval of the test label. We develop two practical algorithms, $\texttt{PDI-CP}$ and $\texttt{JDI-CP}$, and provide a distribution-free coverage analysis under some commonly used detection and imputation procedures. Notably, $\texttt{JDI-CP}$ achieves a finite sample $1-2\alpha$ coverage guarantee. Numerical experiments on both synthetic and real datasets demonstrate that our proposed algorithms exhibit robust coverage properties and comparable efficiency to the oracle baseline.
Lay Summary: When test data contains corrupted values (e.g., a patient’s age mistakenly recorded as 200), traditional prediction interval methods fail because they assume clean, exchangeable data. We propose a *detect-then-impute conformal prediction*: first identify outliers in test features (like flagging implausible values), then replace them with plausible estimates (e.g., using mean). By applying the same detection and imputation steps to both the test data and the calibration data, our method can ultimately construct reliable prediction intervals.
Primary Area: General Machine Learning
Keywords: Conformal prediction, Cellwise outliers, Detection-imputation method, Nonexchangeability
Submission Number: 6236
Loading