MixBoost: Synthetic Oversampling using Boosted Mixup for Handling Extreme ImbalanceDownload PDFOpen Website

2020 (modified: 25 Oct 2021)ICDM 2020Readers: Everyone
Abstract: Training a classification model on a dataset where the instances of one class outnumber those of the other class is a challenging problem. Such imbalanced datasets are standard in real-world situations such as fraud detection, medical diagnosis, and customer churn prediction. We propose a data augmentation method, MixBoost, which intelligently selects (Boost) and then combines (Mix) instances from the majority and minority classes to generate synthetic hybrid instances that have elements of both classes. We evaluate MixBoost on 20 benchmark datasets and show that it outperforms existing approaches. We evaluate the impact of the different components of MixBoost using ablation studies.
0 Replies

Loading