Abstract: Multi-class imbalance problems are non-standard derivative data science problems. These problems are associated with the skewness in the data underlying distribution, which, in turn, raises numerous issues for conventional machine learning techniques. To address the lack of data in imbalance problems, we can either collect new data or oversample the underrepresented classes by synthesizing artificial data from original instances. This paper focuses on the latter and introduces two novel tabular GAN variants to handle multi-class imbalance problems. Empirical results on three datasets from the UCI repository demonstrated that the suggested approaches that use our proposed filtering algorithm based on neighboring rules improved the ability of the decision tree classification model to recognize underrepresented class instances, decreased the bias toward the majority class, and enhanced its generalization ability.
Loading