Adaptive Random Forests with Resampling for Imbalanced data Streams

Luis Eduardo Boiko Ferreira, Heitor Murilo Gomes, Albert Bifet, Luiz S. Oliveira

2019 (modified: 06 Feb 2025)IJCNN 2019Readers: Everyone

Abstract: The large volume of data generated by computer networks, smartphones, wearables and a wide range of sensors, which produce real-time data, are only useful if they can be efficiently processed so that individuals can make timely decisions based on them. In this context, machine learning techniques are widely used. While it performs better than humans in such tasks, every machine learning algorithm has a certain intrinsic bias, which means they assume that the data have specific characteristics, such as having a balanced distribution between classes. As many real-world applications present imbalanced traits in their data, this topic is gaining repercussion over time. In this work, we present the Adaptive Random Forest with Resampling (ARF RE ), which is a classifier designed to deal with imbalanced datasets. ARF RE resample the instances based on the current class label distribution. We show through a set of extensive experiments on seven datasets that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class. On top of that, ARF RE is more efficient regarding execution time in comparison to the standard ARF algorithm.

0 Replies