Comparative Analysis of Oversampling Techniques and Deep Learning for Imbalanced Tabular Data

Alex X. Wang, Colin R. Simpson, Binh P. Nguyen

Published: 01 Jan 2024, Last Modified: 22 Jun 2025TENCON 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Imbalanced datasets pose a significant challenge in machine learning, where some classes are heavily underrepresented, affecting model performance and robustness. This paper offers a comprehensive comparative study of traditional data balancing techniques and explores the potential of deep learning-based tabular data synthesis algorithms. Focusing on Generative Adversarial Networks, Variational Autoencoders, Transformer-based models, and Diffusion models, we evaluated the practical applicability of these methods to the generation of high-quality synthetic data and the improvement of model performance on imbalanced datasets. This study highlights both the potential and limitations of deep generative models, suggesting the need for further research into integrating deep learning with traditional balancing methods.