Differentiable and Scalable Generative Adversarial Models for Data Imputation

Yangyang Wu, Jun Wang, Xiaoye Miao, Wenjia Wang, Jianwei Yin

Published: 01 Jan 2024, Last Modified: 11 Apr 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named ${\sf SCIS}$ to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data. ${\sf SCIS}$ consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Moreover, ${\sf SCIS}$ can also accelerate the autoencoder based imputation models. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 6.23x. Using around 1.27% samples, ${\sf SCIS}$ yields competitive accuracy with the state-of-the-art imputation methods in much shorter computation time.