Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX
Submission Type: Regular Short Paper
Submission Track: Language Modeling and Analysis of Language Models
Submission Track 2: Multilinguality and Linguistic Diversity
Keywords: crosslingual, knowledge transfer, language model, finetuning
TL;DR: We identify factors most important in crosslingual transfer by performing controlled transfer experiments between pretrained English models and transformed Englishes that vary on just one axis of variation.
Abstract: When we transfer a pretrained language model to a new language, there are many axes of variation that change at once. To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of \emph{controlled transfer studies}: we systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time, and then measure the resulting drops in a pretrained model's downstream performance. We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment and embedding matrix re-initialization, even with continued pretraining on 15 million tokens. Moreover, good-quality tokenizers in the transfer language do not make vocabulary alignment easier. Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.
Submission Number: 4734
Loading