Dialectal and Low Resource Machine Translation for Aromanian

Published: 01 Jan 2025, Last Modified: 18 May 2025COLING 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a neural machine translation system that can translate between Romanian, English, and Aromanian (an endangered Eastern Romance language); the first of its kind. BLEU scores range from 17 to 32 depending on the direction and genre of the text. Alongside, we release the biggest known Aromanian-Romanian bilingual corpus, consisting of 80k cleaned sentence pairs. Additional tools such as an agnostic sentence embedder (used for both text mining and automatic evaluation) and a diacritics converter are also presented. Lastly, we describe the online deployment of our quantized model, considering a CPU-driven limited resource scenario.
Loading