Keywords: Active learning, Imbalanced data, Worst-group performance
TL;DR: Uncertainty sampling can improve the worst-subpopulation accuracy in imbalance classification problems, outperforming even methods specialized for this setting like reweighting.
Abstract: Recent experimental and theoretical analyses have revealed that
uncertainty-based active learning algorithms (U-AL) are often not able to
improve the average accuracy compared to even the simple baseline of passive
learning (PL). However, we show in this work that U-AL is a competitive
method in problems with severe data imbalance, when instead of the
\emph{average} accuracy, the focus is the \emph{worst-subpopulation} accuracy.
We show in extensive experiments that U-AL outperforms algorithms that
explicitly aim to improve worst-subpopulation performance such as reweighting.
We provide insights that explain the good performance of U-AL and show a
theoretical result that is supported by our experimental observations.
Submission Number: 61
Loading