Abstract: The class imbalance problem exists in many real-world applications such as fraud detection, medical diagnosis and spam filtering, and seriously influences the performance of learning algorithms. Randomly undersampling is a famous method to solve the problem. However, it cannot well extract the samples nearby the cross-edge of majority and minority classes due to its randomness, while these samples are very important for a classifier since they influence the classification performance. In this paper, we propose a novel Gaussian Mixture Undersampling (GMUS for short). GMUS mainly contains three steps. Firstly, a Gaussian Mixture Model (GMM) is applied to fit the majority samples. Secondly, considering the probability density function (PDF) of predicted minority samples on the well-fitted GMM, the maximum of PDF is selected as the cross-edge of two classes. Finally, we undersample the majority samples near the cross-edge. We do experiments on 16 public datasets and the results demonstrate that GMUS can sample more informative instances and thus improve the performance of classifiers compared with the state-of-the-art undersampling methods. We also apply GMUS to the credit card fraud detection and obtain a good performance.
Loading