Keywords: Active Learning, NLP, robustness
TL;DR: Analysis of robustness of different active learning strategies in front of variations of the learning models
Abstract: Active learning methods are useful when a limited budget for data labelling is available. However, the most widely used methods -- uncertainty sampling -- may suffer from problems derived from an excessive dependence on the model learned during data acquisition. This results in datasets which are not optimal when they are used to train models very different from those used during data creation. In this paper, we link this to the tendency of uncertainty sampling to select outliers and show that other methods that favour selection of representative sampling are more robust to changes in models. We validate this experimentally on four NLP datasets.