Factored Neural Machine Translation at LoResMT 2019Download PDFOpen Website

20 Feb 2024OpenReview Archive Direct UploadReaders: Everyone
Abstract: Low resource languages face a major challenge in developing machine translation systems due to unavailability of accurate and parallel datasets with a large corpus size. In the present work, Factored Neural machine Translation Systems have been developed for the following bidirectional language pairs: English & Bhojpuri, English & Magahi, English & Sindhi along with the uni-directional language pair English - Latvian. Both the lemma and Part of Speech (PoS) tags are included as factors to the surface-level English words. No factoring has been done on the low resource language side. The submitted systems have been developed with the parallel datasets provided and no additional parallel or monolingual data have been included. All the seven systems have been evaluated by the LoResMT 2019 organizers in terms of BLEU score, Precision, Recall and F-measure evaluation metrics. It is observed that better evaluation scores have been obtained in those MT systems in which English is the target language. The reason behind this is that the incorporation of lemma and pos tags factors for English words has improved the vocabulary coverage and has also helped in generalization. It is expected that incorporation of linguistic factors on the low resource language words would have improved the evaluation scores of the MT systems involving those languages on the target side.
0 Replies

Loading