Tabular Data Imputation: Choose KNN over Deep Learning

Florian Lalande; Kenji Doya

Tabular Data Imputation: Choose KNN over Deep Learning

Florian Lalande, Kenji Doya

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: data imputation, knn, deep learning, artificial neural networks, digital sobriety

Abstract: As databases are ubiquitous nowadays, missing values constitute a pervasive problem for data analysis. Over the last 70 years, various imputation algorithms for tabular data have been developed and shown useful at estimating missing values. Besides, recent infatuations for Artificial Neural Networks have led to the development of complex and powerful algorithms for data imputation. This study is the first to compare state-of-the-art deep-learning models with the well-established KNN algorithm (1951). By using real-world and generated datasets in various missing data scenarios, we claim that the good old KNN algorithm is still competitive (nay better) than powerful deep-learning algorithms for tabular data imputation. This work advocates for an appropriate and reasonable use of machine learning, in a world where overconsumption, performances and rapidity unfortunately often prevails over sustainability and common sense.

One-sentence Summary: A quantitative proof that artificial neural networks are overestimated for tabular data imputation.

Supplementary Material: zip

7 Replies

Loading