Machine Translation for Low-resource Finno-Ugric Languages

Published: 2023, Last Modified: 06 Jan 2026NoDaLiDa 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper focuses on neural machine translation (NMT) for low-resource Finno-Ugric languages. Our contributions are three-fold: (1) we extend existing and collect new parallel and monolingual corpora for 20 languages, (2) we expand the 200-language translation benchmark FLORES-200 with manual translations into nine new languages, and (3) we present experiments using the collected data to create NMT systems for the included languages and investigate the impact of back-translation data on the NMT performance for low-resource languages. Experimental results show that carefully selected limited amounts of back-translation directions yield the best results in terms of translation scores, for both high-resource and low-resource output languages.
Loading