kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

Parastoo PASHMCHI; Jérôme Benoit; Motonobu Kanagawa

kNNSampler: Stochastic Imputations for Recovering Missing Value Distributions

Parastoo PASHMCHI, Jérôme Benoit, Motonobu Kanagawa

Published: 01 Dec 2025, Last Modified: 01 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study a missing-value imputation method, termed kNNSampler, that imputes a given unit's missing response by randomly sampling from the observed responses of the k most similar units to the given unit in terms of the observed covariates. This method can sample unknown missing values from their distributions, quantify the uncertainties of missing values, and be readily used for multiple imputation. Unlike popular kNNImputer, which estimates the conditional mean of a missing response given an observed covariate, kNNSampler is theoretically shown to estimate the conditional distribution of a missing response given an observed covariate. Experiments illustrate the performance of kNNSampler. The code for kNNSampler is made publicly available (https://github.com/SAP/knn-sampler).

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: This is the camera-ready version.

Code: https://github.com/SAP/knn-sampler

Supplementary Material: zip

Assigned Action Editor: ~Fabio_Stella1

Submission Number: 5871

Loading