Improving Pre-Trained Multilingual Models with Vocabulary Expansion

Hai Wang

14 Oct 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. How- ever, in multilingual setting, it is extremely resource-consuming to pre-train a deep lan- guage model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a pow- erful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary size for each lan- guage in such a model is relatively small, es- pecially for low-resource languages. This lim- itation inevitably hinders the performance of these multilingual models on tasks such as se- quence labeling, wherein in-depth token-level or sentence-level understanding is essential. In this paper, inspired by previous methods designed for monolingual settings, we in- vestigate two approaches (i.e., joint mapping and mixture mapping) based on a pre-trained multilingual model BERT for addressing the out-of-vocabulary (OOV) problem on a vari- ety of tasks, including part-of-speech tagging, named entity recognition, machine translation quality estimation, and machine reading comprehension. Experimental results show that using mixture mapping is more promising. To the best of our knowledge, this is the first work that attempts to address and discuss the OOV issue in multilingual settings.

0 Replies