Abstract: Pre-trained multilingual language models such as BERT and XLM-RoBERTa are reasonably successful in zero-shot cross-lingual transfer because of the similarities in geometry of contextual embedding spaces for the donor and recipient languages. However, there has been little research on the relationship between the embeddings of individual tokens and the final predictions in downstream tasks. In this paper, we investigate the impact of (1) lexical similarity between the tokens, (2) differences in tokenization, and (3) similarity of embedding spaces. We test this on zero-shot cross-lingual transfer with Named Entity Recognition (NER) as the downstream task.
Paper Type: long
Research Area: Multilinguality and Language Diversity
0 Replies
Loading