Minimal Repairs for Learning Over Incomplete Data

Cheng Zhen; Nischal Aryal; Arash Termehchy; Prayoga; Garrett Biwer

Minimal Repairs for Learning Over Incomplete Data

Cheng Zhen, Nischal Aryal, Arash Termehchy, Prayoga, Garrett Biwer

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: ML over incomplete data, Data imputation for ML, Supervised ML

TL;DR: We demonstrate a new approach to learn accurate machine learning models over incomplete data with the minimal or almost imputation effort.

Abstract: Missing data often exists in real-world datasets, requiring significant time and effort for data repair to learn accurate machine learning (ML) models. In this paper, we show that imputing all missing values is not always necessary to achieve an accurate ML model. We introduce concepts of minimal and almost minimal repair, which are subsets of missing data items in training data whose imputation delivers accurate and reasonably accurate models, respectively. Repairing these sets can significantly reduce the time, computational resources, and manual effort required for learning models. We show that finding these sets is NP-hard for SVM and linear regression and propose efficient approximation algorithms with provable error bounds. Our extensive experiments indicate that our proposed algorithms can substantially reduce the time and effort required to learn on incomplete datasets.

Submission Number: 149

Loading