Sample-pair learning network for extremely imbalanced classification

Linjun Chen; Xiao-Yuan Jing; Runhang Chen; Fei Wu; Yongchang Ding; Changhui Hu; Ziyun Cai

Sample-pair learning network for extremely imbalanced classification

Linjun Chen, Xiao-Yuan Jing, Runhang Chen, Fei Wu, Yongchang Ding, Changhui Hu, Ziyun Cai

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In data classification, class-balanced data is ideal, but real datasets are often imbalanced, necessitating rebalancing through methods like resampling. In recent years, some new generative model-based resampling methods have been proposed. However, when facing extreme class imbalance, where the minority class is strongly underrepresented and on its own does not contain enough information to conduct the generative process. Some deep learning methods have been proposed to solve extremely imbalanced classification problems, but some of them are only used for specific datasets. Therefore, we proposed a novel deep learning method that combines a generative strategy with multi-task joint learning, termed sample-pair learning network (SPLN), for extremely imbalanced classification. The network consists of data preprocessing and multi-task joint learning modules. During data preprocessing, the training set is expanded by constructing positive and negative sample-pairs, then rebalanced using a strategy combining attention and resampling, termed undersampling based on attention power values (APVUS). The multi-task joint learning module employs a Siamese convolutional subnetwork to measure the similarity between sample-pairs and a multi-layer perceptron to recognize the category of single samples. The module can reduce the risk of overfitting caused by excessive noise in the training set. Finally, we designed a voting model based on the Siamese convolutional subnetwork to infer the categories of test samples. Experimental results demonstrate that our approach outperforms state-of-the-art generative model-based methods and is effective and general for extremely imbalanced classification.

Loading