- Keywords: deep learning, imputation, tabular data
- TL;DR: Critics for the state-of-the-art of deep learning applied to tabular data imputation
- Abstract: Datasets with missing values are very common in industry applications. Missing data typically have a negative impact on machine learning models. With the rise of generative models in deep learning, recent studies proposed solutions to the problem of imputing missing values based various deep generative models. Previous experiments with Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) showed promising results in this domain. Initially, these results focused on imputation in image data, e.g. filling missing patches in images. Recent proposals addressed missing values in tabular data. For these data, the case for deep generative models seems to be less clear. In the process of providing a fair comparison of proposed methods, we uncover several issues when assessing the status quo: the use of under-specified and ambiguous dataset names, the large range of parameters and hyper-parameters to tune for each method, and the use of different metrics and evaluation methods.