Keywords: pretraining, active learning, alignment, safety
Abstract: An important barrier to the safe deployment of machine learning systems is the risk of \emph{task ambiguity}, where multiple behaviors are consistent with the provided examples. We investigate whether pretrained models are better active learners, capable of asking for example labels that \textit{disambiguate} between the possible tasks a user may be trying to specify. Across a range of image and text datasets with spurious correlations, latent minority groups, or domain shifts, finetuning pretrained models with data acquired through simple uncertainty sampling achieves the same accuracy with \textbf{up to 6$\times$ fewer labels} compared to random sampling. Moreover, the examples chosen by these models are preferentially minority classes or informative examples where the spurious feature and class label are decorrelated. Notably, gains from active learning are not seen in unpretrained models, which do not select such examples, suggesting that the ability to actively learn is an emergent property of the pretraining process.
One-sentence Summary: Pretraining makes models better active learners, which learn up to 6x faster and can resolve task ambiguity
13 Replies
Loading