Abstract: Languages are mortal. While the NLP community tends to expand its competence to multilingual models, there is still a great risk for low-resource languages to vanish before any prototypes appear for them.This paper presents a series of experiments that explore the transfer learning for low-resource languages, testing hypotheses about finding the optimal donor language on the typological relations and grammatical features. Our results showed that multilingual models like mT5 obtain significantly lower perplexity on 45/46 low-resource languages without training on them.We collected the most variable multilingual training corpus available with 288 languages, based on the linguistically-wise databases, field linguist resources, the World Atlas of Language Structures, and Wikipedia.
Paper Type: long
0 Replies
Loading