Abstract: Machine Translation (MT) is a crucial field in Natural Language Processing, with recent advancements like the transformer architecture revolutionizing the task. While MT typically aims for accurate and natural translations, there are instances, such as educational translations, where maintaining the original syntactic structure and meaning is paramount. Interlinear translation, exemplified by its application to ancient texts like the Iliad and the Bible, emphasizes this fidelity to the source text's structure.
Despite the importance of interlinear translation, research in automating this process remains limited, particularly for ancient texts. Our work aims to address this gap by evaluating state-of-the-art neural machine translation models on the task of interlinear translation from Ancient Greek to Polish and English. We compare the performance of general-purpose multilingual models with dedicated language models and assess the impact of Part-of-Speech (POS) tags as well as data preprocessing strategies on model performance.
Our contributions include constructing a word-level-aligned parallel corpus of interlinear translations of the Greek New Testament. We fine-tune four base models in various conditions, totaling 144 models, the best of which we make publicly available. Last, but not least, we suggest three approaches for encoding morphological information via dedicated embedding layers, which outperform solutions that do not utilize tags by up to 20% (BLEU score) on an interlinear translation task into both of the target languages.
Paper Type: Long
Research Area: Machine Translation
Research Area Keywords: domain adaptation, multilingual evaluation, less-resourced languages, resources for less-resourced languages
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: Ancient Greek, English, Polish
Submission Number: 617
Loading