Abstract: Large pre-trained multilingual models such as mBERT and XLM-R enabled effective cross-lingual zero-shot transfer in many NLP tasks. A cross-lingual adjustment of these models using a small parallel corpus can further improve results. This is a more data efficient method compared to training a machine-translation system or a multi-lingual model from scratch using only parallel data. In this study, we experiment with zero-shot transfer of English models to four typologically different languages (Spanish, Russian, Vietnamese, and Hindi) and three NLP tasks (QA, NLI, and NER). We carry out a cross-lingual adjustment of an off-the-shelf mBERT model. We show that this adjustment makes embeddings of semantically similar words from different languages closer to each other, while keeping unrelated words apart. In contrast, fine-tuning of mBERT on English data (for a specific task such as NER) draws embeddings of both related and unrelated words closer to each other. The cross-lingual adjustment of mBERT improves NLI in four languages and NER in two languages. However, in the case of QA performance never improves and sometimes degrades. In that, the increase in the amount of parallel data is most beneficial for NLI, whereas QA performance peaks at roughly 5K parallel sentences and further decreases as the number of parallel sentences increases.
Paper Type: long
0 Replies
Loading