Abstract: While cross-linguistic model transfer is effective in many settings, there is still limited understanding of the conditions under which it works. In this paper, we focus on assessing the role of lexical semantics in cross-lingual transfer, as we compare its impact to that of other language properties. Examining each language property individually, we systematically analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space. We do so by artificially manipulating the English sentences in ways that mimic specific characteristics of the target language, and reporting the effect of each modification on the quality of alignment with the representation space. We show that while properties such as the script or word order only have a limited impact on the alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: low-resource methods for NLP, distillation, cross-lingual NLP, cross-lingual transfer, multilingual representations, polysemy, lexical relationships, natural language inference
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Theory
Languages Studied: English, Spanish, Greek, Hebrew, Hindi, Simlified Chinese
Submission Number: 3453
Loading