EGAIN: Enhanced Generative Adversarial Networks for Imputing Missing Values

EGAIN: Enhanced Generative Adversarial Networks for Imputing Missing Values

TMLR Paper4700 Authors

18 Apr 2025 (modified: 24 Jul 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Missing values pose a challenge in predictive analysis specially in big data because most models depend on complete datasets to estimate functional relationships between variables. Generative Adversarial Imputation Networks (GAIN) are among the most reliable methods to predict and impute missing values. This research introduces Enhanced Generative Adversarial Networks (EGAIN), which address the GAIN convergence issue, introduce new functionality to the GAIN process, and significantly improve its performance.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=9lCHLhMOiZ&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)

Changes Since Last Submission: Dear Editorial team, The revised manuscript is shared with the reviewers. We thank the reviewers for their your constructive and insightful feedback, which has greatly contributed to improving the quality of the manuscript. In response to the received comments, we have made several significant revisions, including: 1. Inclusion of Median Imputation as a Baseline: We have added Median imputation as a baseline method to benchmark the performance of both GAIN and EGAIN. The results demonstrate statistically significant improvements over this baseline. 2. Expanded Dataset Evaluation: We clarified the use of real-world benchmark datasets (Breast Cancer Wisconsin, Spambase, and Credit Card Default), and we have extended our evaluation by including two additional datasets, Letter Recognition and Online News Popularity, that were used in the original GAIN study. These additions provide a more comprehensive assessment of all three methods. 3. Revised Introduction: The Introduction has been revised to clearly articulate the key enhancements introduced by EGAIN, with a specific focus on distinguishing our architectural and training improvements from those of the original GAIN implementation. 4. Improved Methods Section: The "Materials and Methods" section has been reorganized and clarified to better present EGAIN’s innovations and provide a more transparent description of the experimental setup. 5. Updated Discussion: We have revised the Discussion section to include a dedicated explanation of the motivation and rationale for using convolutional layers in tabular data settings. 6. Percent Change formula was incorrectly applied as 100*(new-old)/new and is now fixed to reflect 100*(new-old)/old. All changes have been marked in blue in the revised manuscript to facilitate review and tracking.

Assigned Action Editor: ~Jes_Frellsen1

Submission Number: 4700

Loading