- Keywords: multilingual, natural language processing, embedding alignment, BERT, word embeddings
- TL;DR: We propose procedures for evaluating and strengthening contextual embedding alignment and show that they both improve multilingual BERT's zero-shot XNLI transfer and provide useful insights into the model.
- Abstract: We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in understanding and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching fully-supervised models for Bulgarian and Greek. Further, using non-contextual and contextual versions of word retrieval, we show that BERT outperforms fastText while being able to distinguish between multiple uses of a word, suggesting that pre-training subsumes word vectors for learning cross-lingual signals. Finally, we use the contextual word retrieval task to gain a better understanding of the strengths and weaknesses of multilingual pre-training.