Breaking Boundaries: Neural Approaches to Interlinear Translation of Classic Texts

Breaking Boundaries: Neural Approaches to Interlinear Translation of Classic Texts

ACL ARR 2024 April Submission617 Authors

16 Apr 2024 (modified: 20 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Machine Translation (MT) is a crucial field in Natural Language Processing, with recent advancements like the transformer architecture revolutionizing the task. While MT typically aims for accurate and natural translations, there are instances, such as educational translations, where maintaining the original syntactic structure and meaning is paramount. Interlinear translation, exemplified by its application to ancient texts like the Iliad and the Bible, emphasizes this fidelity to the source text's structure. Despite the importance of interlinear translation, research in automating this process remains limited, particularly for ancient texts. Our work aims to address this gap by evaluating state-of-the-art neural machine translation models on the task of interlinear translation from Ancient Greek to Polish and English. We compare the performance of general-purpose multilingual models with dedicated language models and assess the impact of Part-of-Speech (POS) tags as well as data preprocessing strategies on model performance. Our contributions include constructing a word-level-aligned parallel corpus of interlinear translations of the Greek New Testament. We fine-tune four base models in various conditions, totaling 144 models, the best of which we make publicly available. Last, but not least, we suggest three approaches for encoding morphological information via dedicated embedding layers, which outperform solutions that do not utilize tags by up to 20% (BLEU score) on an interlinear translation task into both of the target languages.

Paper Type: Long

Research Area: Machine Translation

Research Area Keywords: domain adaptation, multilingual evaluation, less-resourced languages, resources for less-resourced languages

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources

Languages Studied: Ancient Greek, English, Polish

Submission Number: 617

Loading