Identifying the Limits of Cross-Domain Knowledge Transfer for Pretrained ModelsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: transfer learning, pretrained language model
Abstract: There is growing evidence that pretrained language models improve task-specific fine-tuning even where the task examples are radically different from those seen in training. What is the nature of this surprising cross-domain transfer? We offer a partial answer via a systematic exploration of how much transfer occurs when models are denied any information about word identity via random scrambling. In four classification tasks and two sequence labeling tasks, we evaluate LSTMs using GloVe embeddings, BERT, and baseline models. Among these models, we find that only BERT shows high rates of transfer into our scrambled domains, and for classification but not sequence labeling tasks. Our analyses seek to explain why transfer succeeds for some tasks but not others, to isolate the separate contributions of pretraining versus fine-tuning, to show that the fine-tuning process is not merely learning to unscramble the scrambled inputs, and to quantify the role of word frequency. These findings help explain where and why cross-domain transfer occurs, which can guide future studies and practical fine-tuning efforts.
One-sentence Summary: We explore how much transfer occurs when models are denied any information about word identity via random scrambling.
Supplementary Material: zip
31 Replies

Loading