Improving class and group imbalanced classification with uncertainty-based active learning

Published: 27 Oct 2023, Last Modified: 22 Dec 2023RealML-2023EveryoneRevisionsBibTeX
Keywords: Active learning, Imbalanced data, Worst-group performance
TL;DR: Uncertainty sampling can improve the worst-subpopulation accuracy in imbalance classification problems, outperforming even methods specialized for this setting like reweighting.
Abstract: Recent experimental and theoretical analyses have revealed that uncertainty-based active learning algorithms (U-AL) are often not able to improve the average accuracy compared to even the simple baseline of passive learning (PL). However, we show in this work that U-AL is a competitive method in problems with severe data imbalance, when instead of the \emph{average} accuracy, the focus is the \emph{worst-subpopulation} accuracy. We show in extensive experiments that U-AL outperforms algorithms that explicitly aim to improve worst-subpopulation performance such as reweighting. We provide insights that explain the good performance of U-AL and show a theoretical result that is supported by our experimental observations.
Submission Number: 61