mDAE : modified Denoising AutoEncoder for missing data imputation

TMLR Paper3052 Authors

23 Jul 2024 (modified: 28 Oct 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper introduces a method based on Denoising AutoEncoder (DAE) for missing data imputation. The specificities of the proposed mDAE method result from a modification of the loss function and a straightforward procedure for choosing the hyper-parameters. An ablation study of these specificities demonstrates their relevance on several UCI Machine Learning Repository datasets for several types and proportions of missing values. This numerical study is completed by comparing eight other methods (four standard and four more recent), demonstrating the good behaviour of the mDAE method. A criterion called Mean Distance to Best (MDB) is proposed to globally compare the results of the methods on all datasets. According to this criterion, the mDAE method was consistently ranked among the top three methods (along with SoftImput and missForest), while the four more recent methods were systematically ranked last. The Python code of the numerical study will be available on GitHub so that results can be reproduced or generalized with other datasets and methods.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~antonio_vergari2
Submission Number: 3052
Loading