An Expanded Benchmark that Rediscovers and Affirms the Edge of Uncertainty Sampling for Active Learning
Abstract: Active Learning (AL) addresses the crucial challenge of enabling machines to efficiently gather labeled examples through strategic queries. Among the many AL strategies, Uncertainty Sampling (US) stands out as one of the most widely adopted. US queries the example(s) that the current model finds uncertain, proving to be both straightforward and effective. Despite claims in the literature suggesting superior alternatives to US, community-wide acceptance remains elusive. In fact, existing benchmarks present conflicting conclusions on the continued competitiveness of US. In this study, we review the literature on AL strategies in the last decade and build the most comprehensive open-source AL benchmark to date to understand the relative merits of different AL strategies. The benchmark surpasses existing ones by encompassing a broader coverage of strategies, models, and data. Through our extensive evaluation, we uncover fresh insights into the often-overlooked issue of model compatibility in the context of US to clarify the conflicting conclusions in existing benchmarks. Notably, our findings affirm that when paired with compatible models, US maintains a competitive edge over other strategies. These findings have practical implications and provide a concrete recipe for AL practitioners---by adopting compatible query-oriented and task-oriented models for US as the first-hand choice, empowering them to make informed decisions in their work.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Andreas_Kirsch1
Submission Number: 3257
Loading