Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource Neural Machine Translation Enhancement: The Case of Persian-Spanish

Benyamin Ahmadnia, Bonnie J. Dorr, Raúl Aranovich

Published: 01 Jan 2021, Last Modified: 09 Dec 2024ACLING 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Although the Neural Machine Translation (NMT) framework has already been shown effective in large training data scenarios, it is less effective for low-resource conditions. To improve NMT performance in a low-resource setting, we extend the high-quality training data by generating a pseudo bilingual dataset and then filtering out low-quality alignments using a quality estimation based on back-translation. We demonstrate that our approach yields significantly higher BLEU scores than those of a set of baselines.