Sparse GAIN: Imputation Methods to Handle Missing Values with Sparse Initialization

Brian Patrick van Oers, Işıl Baysal Erez, Maurice van Keulen

Published: 01 Jan 2026, Last Modified: 26 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Missing data is one of the major issues in data analysis. Many deep learning-based imputation methods are proposed to handle datasets with missing values. Generative Adversarial Imputation Nets (GAIN) are among the most popular. Imputation methods based on GAN models can be applied to complex datasets. Similar to prediction models, imputation methods based on deep learning algorithms can be costly in terms of computational complexity and energy consumption. Deep learning models based on sparse initialization have shown better performance on general machine learning tasks than models based on dense initialization regarding both model accuracy and efficiency. In this work, we apply sparse initialization to imputation of missing data with GAIN. Our results show that an imputer initialized with a highly sparse generator outperforms Dense GAIN on datasets with 20% missing data according to the Missing Completely at Random (MCAR) mechanism. At 90% generator sparsity, we observed a 7.8% increase in performance on a medical domain dataset and at 99% we measured 7.7% improvement on an image dataset. Moreover, we emphasize that not only did the performance improve, but the approach also showed to be significantly more computationally efficient in terms of floating-point operations (FLOPs). The code is available at github.com/BrianPvanOers/S-GAIN.

External IDs:doi:10.1007/978-3-032-10486-1_22