Important sampling based active learning for imbalance classificationDownload PDFOpen Website

2020 (modified: 02 Nov 2022)Sci. China Inf. Sci. 2020Readers: Everyone
Abstract: Imbalance in data distribution hinders the learning performance of classifiers. To solve this problem, a popular type of methods is based on sampling (including oversampling for minority class and undersampling for majority class) so that the imbalanced data becomes relatively balanced data. However, they usually focus on one sampling technique, oversampling or undersampling. Such strategy makes the existing methods suffer from the large imbalance ratio (the majority instances size over the minority instances size). In this paper, an active learning framework is proposed to deal with imbalanced data by alternative performing important sampling (ALIS), which consists of selecting important majority-class instances and generating informative minority-class instances. In ALIS, two important sampling strategies affect each other so that the selected majority-class instances provide much clearer information in the next oversampling process, meanwhile the generated minority-class instances provide much more sufficient information for the next undersampling procedure. Extensive experiments have been conducted on real world datasets with a large range of imbalance ratio to verify ALIS. The experimental results demonstrate the superiority of ALIS in terms of several well-known evaluation metrics by comparing with the state-of-the-art methods.
0 Replies

Loading