Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models

Zuchao Li; Kevin Barry Parnow; hai zhao; Zhuosheng Zhang; Rui Wang; Masao Utiyama; Eiichiro Sumita

Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models

Zuchao Li, Kevin Barry Parnow, hai zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: transfer learning, pre-trained language models, contextualized language models

Abstract: Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant. In this work, building upon the recent works connecting cross-lingual transfer learning and neural machine translation, we thus propose a novel cross-lingual transfer learning framework for PrLMs: \textsc{TreLM}. To handle the symbol order and sequence length differences between languages, we propose an intermediate ``TRILayer" structure that learns from these differences and creates a better transfer in our primary translation direction, as well as a new cross-lingual language modeling objective for transfer training. Additionally, we showcase an embedding aligning that adversarially adapts a PrLM's non-contextualized embedding space and the TRILayer structure to learn a text transformation network across languages, which addresses the vocabulary difference between languages. Experiments on both language understanding and structure parsing tasks show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency. Moreover, despite an insignificant performance loss compared to pre-training from scratch in resource-rich scenarios, our transfer learning framework is significantly more economical.

One-sentence Summary: We proposed a general transfer learning framework for pre-trained contextualized language models.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=3eGSpOLD3c

20 Replies

Loading