Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

Zhengxuan Wu; Alex Tamkin; Isabel Papadimitriou

Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies

Zhengxuan Wu, Alex Tamkin, Isabel Papadimitriou

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Short Paper

Submission Track: Language Modeling and Analysis of Language Models

Submission Track 2: Multilinguality and Linguistic Diversity

Keywords: crosslingual, knowledge transfer, language model, finetuning

TL;DR: We identify factors most important in crosslingual transfer by performing controlled transfer experiments between pretrained English models and transformed Englishes that vary on just one axis of variation.

Abstract: When we transfer a pretrained language model to a new language, there are many axes of variation that change at once. To disentangle the impact of different factors like syntactic similarity and vocabulary similarity, we propose a set of \emph{controlled transfer studies}: we systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time, and then measure the resulting drops in a pretrained model's downstream performance. We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment and embedding matrix re-initialization, even with continued pretraining on 15 million tokens. Moreover, good-quality tokenizers in the transfer language do not make vocabulary alignment easier. Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.

Submission Number: 4734

Loading