Multi-class Imbalanced Data Classification by Deep Multi-set Discriminant Metric Learning with Optimal Balance Sampling
Abstract: Data classification is one of the core technologies of data mining, which has great scientific significance and commercial value because it is widely used in medical science, bioinformatics, and computer science. However, data often show the characteristics of class imbalance. This will lead to the minority class samples misclassifying into majority class, and then reduce the classification performance of the classifier. Compared with the two-class classification scenario, multi-class imbalanced data classification is more difficult since it causes the samples from different minority classes to be misclassified into majority classes. In this paper, we propose a novel class-imbalanced learning model for multi-class imbalanced data classification. Specifically, we first define a multiple balanced subset construction strategy by optimal balance sampling and then design a deep multi-set discriminant metric learning network for multiple subset feature learning. Extensive experimental results on four typical class-imbalanced datasets from three important fields demonstrate that compared with state-of-the-art methods, our approach can improve the Average classification performance by 4.02% on contraceptive, 7.82% on yeast, 5.50% on mushroom, 4.12% on pageblocks.
Loading