An active learning budget-based oversampling approach for partially labeled multi-class imbalanced data streams

Published: 01 Jan 2023, Last Modified: 06 Feb 2025SAC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Learning classification models from multi-class imbalanced data streams is a challenging task in machine learning. Moreover, there is a common assumption that all instances are labeled and available for the training phase. However, this is not realistic in real-world scenarios when learning from partially labeled data. In this work, we propose an active learning method based on labeling budget that can tackle multi-class imbalance data, concept drift, and limited access to labels. The proposed method combines information from budget constraints and dynamic class ratios to generate new relevant instances. We performed experiments on 18 real-world data streams and 11 semi-synthetic data streams, under different labeling budgets, in order to evaluate the performance of the proposed method under a varied set of scenarios. The experimental study showed that our oversampling method was able to improve the performance of state-of-the-art classifiers for multi-class imbalanced data streams under strict budgets and outperforms previously proposed oversampling methods in the domain.
Loading