Limitations of Active Learning With Deep Transformer Language Models

Mike D'Arcy; Doug Downey

Limitations of Active Learning With Deep Transformer Language Models

Mike D'Arcy, Doug Downey

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Active Learning, Machine Learning, Natural Language Processing

Abstract: Active Learning (AL) has the potential to reduce labeling cost when training natural language processing models, but its effectiveness with the large pretrained transformer language models that power today's NLP is uncertain. We present experiments showing that when applied to modern pretrained models, active learning offers inconsistent and often poor performance. As in prior work, we find that AL sometimes selects harmful "unlearnable" collective outliers, but we discover that some failures have a different explanation: the examples AL selects are informative but also increase training instability, reducing average performance. Our findings suggest that for some datasets this instability can be mitigated by training multiple models and selecting the best on a validation set, which we show impacts relative AL performance comparably to the outlier-pruning technique from prior work while also increasing absolute performance. Our experiments span three pretrained models, ten datasets, and four active learning approaches.

One-sentence Summary: Active learning with large transformers sometimes fails on NLP tasks, but unlike previous work which suggests that it selects harmful outliers, we find evidence that it selects useful but hard-to-optimize examples.

Supplementary Material: zip

17 Replies

Loading