DG-SMOTE: A Distance-Angle-Based Genetic Synthetic Minority Over-Sampling Technique for Unbalanced Data Learning

Published: 2025, Last Modified: 07 Jan 2026IEEE Trans. Evol. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Many real-world applications often generate unbalanced data. Learning from such data may lead to biased classifiers that perform poorly on the class of interest. Oversampling methods have been shown to be effective in rebalancing unbalanced data to help classifiers avoid performance bias. However, many existing oversampling methods rely on a predesigned linear model structure and the neighborhood information of an original instance. This may lead to the generation of noisy instances when the original data has noise. In this study, we develop a novel oversampling method in which genetic programming is introduced to automatically select good-quality instances and evolve a model structure that combines the selected instances to create a new instance. In the proposed oversampling method, an individual is used to represent a generated instance, which is evaluated by the fitness function designed based on the Euclidean distance and the cosine theorem. In the experiments, we examine the effectiveness of the proposed oversampling method in assisting different types of classifiers to solve the issue of class imbalance, and compare it with popular sampling methods in unbalanced classification. The results have been analyzed comprehensively, indicating that the new method successfully addressed the class imbalance issue by generating a group of good-quality instances for the minority class and outperformed the compared sampling methods in almost all cases.
Loading