Low-regret Active Learning

TMLR Paper212 Authors

26 Jun 2022 (modified: 28 Feb 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: We develop an online learning algorithm for identifying unlabeled data points that are most informative for training (i.e., active learning). By formulating the active learning problem as the prediction with sleeping experts problem, we provide a regret minimization framework for identifying relevant data with respect to any given definition of informativeness. Motivated by the successes of ensembles in active learning, we define regret with respect to an omnipotent algorithm that has access to an infinity large ensemble. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on easy instances while remaining resilient to adversarial ones. Low regret implies that we can be provably competitive with an ensemble method without the computational burden of having to train an ensemble. This stands in contrast to state-of-the-art active learning methods that are overwhelmingly based on greedy selection, and hence cannot ensure good performance across problem instances with high amounts of noise. We present empirical results demonstrating that our method (i) instantiated with an informativeness measure consistently outperforms its greedy counterpart and (ii) reliably outperforms uniform sampling on real-world scenarios.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Chicheng_Zhang1
Submission Number: 212
Loading