Keywords: missing data imputation, missing data completion, kernel, ridge regression, non-missing feature
Abstract: Iterative imputation is a prevalent method for completing missing data, which involves iteratively imputing each feature by treating it as a target variable and predicting its missing values using the remaining features. However, existing iterative imputation methods exhibit two critical defects: (1) model misspecification, where a uniform parametric form of model is applied across different features, conflicting with heterogeneous data generation processes; (2) underuse of oracle features, where all features are treated as potentially missing, neglecting the valuable information in fully observed features.
In this work, we propose kernel point imputation (KPI), a bi-level optimization framework designed to address these issues.
The inner-level optimization optimizes the model form for each feature in a reproducing kernel Hilbert space, mitigating model misspecification. The outer-level optimization leverages oracle features as supervision signals to refine imputations.
Extensive experiments on real-world datasets demonstrate that KPI consistently outperforms state-of-the-art imputation methods. Code is available at https://github.com/FMLYD/kpi.git.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 28751
Loading