In Defense of Zero Imputation for Tabular Deep Learning

Mike Van Ness; Madeleine Udell

In Defense of Zero Imputation for Tabular Deep Learning

Mike Van Ness, Madeleine Udell

Published: 28 Oct 2023, Last Modified: 10 Nov 2023TRL @ NeurIPS 2023 PosterEveryoneRevisionsBibTeX

Keywords: missing values, imputation

TL;DR: We compare and analyze deep impute-then-predict models, finding that zero imputation almost always works as well.

Abstract: Missing values are a common problem in many supervised learning contexts. While a wealth of literature exists related to missing value imputation, less literature has focused on the impact of imputation on downstream supervised learning. Recently, impute-then-predict neural networks have been proposed as a powerful solution to this problem, allowing for joint optimization of imputations and predictions. In this paper, we illustrate a somewhat surprising result: multi-layer perceptrons (MLPs) paired with zero imputation perform as well as more powerful deep impute-then-predict models on real-world data. To support this finding, we analyze the results of various deep impute-then-predict models to better understand why they fail to outperform zero imputation. Our analysis sheds light onto the difficulties of imputation in real-world contexts, and highlights the utility of zero imputation for tabular deep learning.

Submission Number: 10

Loading