Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain

Published: 01 Jan 2024, Last Modified: 20 Feb 2025WMT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.
Loading