Machine Translation for Low-resource Finno-Ugric LanguagesDownload PDF

Published: 20 Mar 2023, Last Modified: 18 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: Finno-ugric, multilingual, machine translation
TL;DR: We create a multilingual benchmark and neural MT systems for several low-resource languages. Turns out that for low-res output not all-to-all translation directions need back-translation data.
Abstract: This paper focuses on neural machine translation (NMT) for low-resource Finno-Ugric languages. Our contributions are three-fold: (1) we extend existing and collect new parallel and monolingual corpora for 20 languages, (2) we expand the 200-language translation benchmark FLORES-200 with manual translations into nine new languages, and (3) we present experiments using the collected data to create NMT systems for the included languages and investigate the impact of back-translation data on the NMT performance for low-resource languages. Experimental results show that carefully selected limited amounts of back-translation directions yield the best results in terms of translation scores, for both high-resource and low-resource output languages.
4 Replies

Loading