Evolving Domain Adaptation of Pretrained Language Models for Text Classification

Published: 28 Oct 2023, Last Modified: 02 Apr 2024DistShift 2023 PosterEveryoneRevisionsBibTeX
Keywords: Evolving Domain Adaptation, Pre-trained Language Model, Continual Domain Adaptation, Unsupervised Domain Adaptation, Time-Series Data, Semi-supervised learning
TL;DR: This paper proposes innovative strategies for mitigating evolving domain shift in time-series text classification tasks with pre-trained Large Language Models, resulting in enhanced model performance on diverse real-world datasets
Abstract: Pre-trained language models have shown impressive performance in various text classification tasks. However, the performance of these models is highly dependent on the quality and domain of the labeled examples. In dynamic real-world environments, text data content naturally evolves over time, leading to a natural $\textit{evolving domain shift}$. Over time, this continuous temporal shift impairs the performance of static models, as their training becomes increasingly outdated. To address this issue, we propose two dynamic buffer-based adaptation strategies: one utilizes self-training with pseudo-labeling, and the other employs a tuning-free, in-context learning approach for large language models (LLMs). We validate our methods with extensive experiments on two longitudinal real-world social media datasets, demonstrating their superiority compared to unadapted baselines. Furthermore, we introduce a COVID-19 vaccination stance detection dataset, serving as a benchmark for evaluating pre-trained language models within evolving domain adaptation settings.
Submission Number: 23