African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole TranslationDownload PDF

Published: 03 Mar 2023, Last Modified: 16 Apr 2023AfricaNLP 2023Readers: Everyone
Keywords: Machine Translation, Low-resource NLP, Cross-lingual Transfer Learning, African Language NLP, Creole Language NLP
TL;DR: We explore augmenting African-diaspora Creole language MT training data with data from African substrate languages rather than European superstrate languages.
Abstract: Machine translation (MT) model training is difficult for low-resource languages, such African-diaspora Creole languages, because of data scarcity. Cross-lingual data augmentation methods with knowledge transfer from related high-resource languages are a common technique to overcome this disadvantage. For instance, practitioners may transfer knowledge from a language in the same language family as the low-resource language of interest. African-diaspora Creole languages are low-resource and have simultaneous relationships with multiple language groups. These languages, such as Haitian and Jamaican, are typically lexified by colonial European languages, but they are structurally similar to African languages. We explore the advantages of transferring knowledge from the European lexifier language versus the phylogenetic and typological relatives of the African substrate languages. We analysed Haitian and Jamaican MT: both controlling tightly for data properties across compared transfer languages and later allowing use of all data we collected. Our inquiry demonstrates a significant advantage in using African transfer languages in some settings.
0 Replies

Loading