Constructive sample partition-based parameter-free sampling for class-overlapped imbalanced data classification
Abstract: Imbalanced data widely exists in real applications ranging from medical diagnosis to economic fraud detection, etc. Data level method is one of the prevalent methods to deal with imbalanced data by re-balancing the distribution between different classes. Recent researches reveal that handling the class-overlapping of imbalanced data when designing data-level approach can effectively improve the performance of imbalanced learning. However, most existing data-level methods rely on specific parameters to obtain desired performance, making them hard to generalize to other scenarios. And the intractable data difficulty factors, i.e., the most frequent class-overlapping problem, makes them confront additional challenges. Designing efficient, flexible method that considers the parameter-free designing and the class-overlapping handling simultaneously remains a challenge. This paper proposes to deal with the class-overlapped imbalanced data with parameter-free adaptive method. To be specific, we first propose a parameter-free constructive sample partition (CSP) method, and then design an adaptive parameter-free CSP-based undersampling method (CSPUS) and an adaptive parameter-free CSP-based hybrid sampling method (CSPHS) to balance the class distribution by handling the class-overlap of the original data. Numerical experiments on 18 representative high-overlap imbalanced datasets from KEEL repository and 23 state-of-the-art comparison methods demonstrate the effectiveness of CSPUS and CSPHS. The source code of our proposed methods is available at https://github.com/ytyancp/CSPS.
Loading