Factored Neural Machine Translation on Low Resource Languages in the COVID-19 crisisDownload PDF

12 Aug 2020 (modified: 24 May 2023)Submitted to NLP-COVID19-EMNLPReaders: Everyone
Keywords: Factored Neural Machine Translation, COVID-19, Lingala, Nigerian Fulfulde, Kurdish Kurmanji, Kinyarwanda, Luganda
TL;DR: Factored Neural Machine Translation models have been developed for machine translation of COVID-19 related multilingual documents from English to five low resource languages, Lingala, Nigerian Fulfulde, Kurdish Kurmanji, Kinyarwanda, and Luganda.
Abstract: Factored Neural Machine Translation models have been developed for machine translation of COVID-19 related multilingual documents from English to five low resource languages, Lingala, Nigerian Fulfulde, Kurdish Kurmanji, Kinyarwanda, and Luganda, which are spoken in the impoverished and strive-torn regions of Africa and Middle-East Asia. The objective of the task is to ensure that COVID-19 related authentic information reaches the common people in their own language, primarily those in marginalized communities with limited linguistic resources, so that they can take appropriate measures to combat the pandemic without falling for the infodemic arising from the COVID-19 crisis. Two NMT systems have been developed for each of the five language pairs – one with the sequence-to-sequence NMT transformer model as the baseline and the other with factored NMT model in which lemma and POS tags are added to each word on the source English side. The motivation behind the factored NMT model is to address the paucity of linguistic resources by using the linguistic features which also helps in generalization. It has been observed that the factored NMT model outperforms the baseline model by a factor of around 10% in terms of the Unigram BLEU score.
3 Replies

Loading