Keywords: active learning, data augmentation, look-ahead acquisition, entropy, mixup
TL;DR: We propose a new algorithm which improves the performance of existing research problems of active learning.
Abstract: Active learning effectively collects data instances for training deep learning models when the labeled dataset is limited and the annotation cost is high. Data augmentation is another effective technique to enlarge the limited amount of labeled instances. The scarcity of labeled dataset leads us to consider the integration of data augmentation and active learning. One possible approach is a pipelined combination, which selects informative instances via the acquisition function and generates virtual instances from the selected instances via augmentation. However, this pipelined approach would not guarantee the informativeness of the virtual instances. This paper proposes Look-Ahead Data Acquisition via augmentation, or LADA framework, that looks ahead the effect of data augmentation in the process of acquisition. LADA jointly considers both 1) unlabeled data instance to be selected and 2) virtual data instance to be generated by data augmentation, to construct the acquisition function. Moreover, to generate maximally informative virtual instances, LADA optimizes the data augmentation policy to maximize the predictive acquisition score, resulting in the proposal of InfoSTN and InfoMixup. The experimental results of LADA show a significant improvement over the recent augmentation and acquisition baselines that were independently applied.
Supplementary Material: pdf
Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.