Keywords: Active Learning, Benchmark
TL;DR: Active Learning Benchmark implementing previously proposed protocols into multiple data domains
Abstract: Active Learning (AL) deals with identifying the most informative samples for
labeling to reduce data annotation costs for supervised learning tasks. AL
research suffers from the fact that lifts from literature generalize poorly and
that only a small number of repetitions of experiments are conducted. To overcome
these obstacles, we propose CDALBench, the first active learning benchmark
which includes tasks in computer vision, natural language processing and tabular
learning. Furthermore, by providing an efficient, greedy oracle, CDALBench
can be evaluated with 50 runs for each experiment. We show, that both the
cross-domain character and a large amount of repetitions are crucial for
sophisticated evaluation of AL research. Concretely, we show that the
superiority of specific methods varies over the different domains, making it
important to evaluate Active Learning with a cross-domain benchmark.
Additionally, we show that having a large amount of runs is crucial. With only
conducting three runs as often done in the literature, the superiority of
specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established method's performance can be significantly better and significantly
worse than random for the same dataset.
Supplementary Material: zip
Submission Number: 979
Loading