Active Learning Over Multiple Domains in Natural Language Tasks

Shayne Longpre; Julia Rachel Reisler; Edward Greg Huang; Yi Lu; Andrew Frank; Nikhil Ramesh; Christopher DuBois

Active Learning Over Multiple Domains in Natural Language Tasks

Shayne Longpre, Julia Rachel Reisler, Edward Greg Huang, Yi Lu, Andrew Frank, Nikhil Ramesh, Christopher DuBois

Published: 21 Oct 2022, Last Modified: 20 Apr 2025NeurIPS 2022 Workshop DistShift PosterReaders: Everyone

Keywords: active learning, domain shift, NLP

TL;DR: A comprehensive analysis of active learning methods in NLP for multiple shifted domains.

Abstract: Studies of active learning traditionally assume the target and source data stem from a single domain. However, in realistic applications, practitioners often require active learning with multiple sources of out-of-distribution data, where it is unclear a priori which data sources will help or hurt the target domain. We survey a wide variety of techniques in active learning (AL), domain shift detection (DS), and multi-domain sampling to examine this challenging setting for question answering and sentiment analysis. Among 18 acquisition functions from 4 families of methods, we find H-Divergence methods, and particularly our proposed variant DAL-E, yield effective results, averaging 2-3% improvements over the random baseline. Our findings yield the first comprehensive analysis of both existing and novel methods for practitioners faced with multi-domain active learning for natural language tasks.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/active-learning-over-multiple-domains-in/code)

1 Reply

Loading