Improving Translation between Spanish and Mapudungun through Transfer Learning

ACL ARR 2024 June Submission4966 Authors

16 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural Machine Translation (NMT) systems for lower-resource languages like Mapudungun face significant challenges due to limited training data and linguistic complexities. This project aims to improve translation between Spanish and Mapudungun through transfer learning, leveraging pre-trained models on Spanish-English and Spanish-Finnish language pairs. Our contributions include demonstrating the effectiveness of transfer learning in this context and providing a comparative analysis of different parent models. Our main findings show that transfer learning enhances translation performance, with not much of a difference between the Spanish-English and Spanish-Finnish pre-trained model performance. This suggests that factors beyond morphological similarity, such as data quality or tokenization methods, play a crucial role in transfer learning success. These insights hope to pave the way for future research into optimizing translation tools for low-resource languages and involving communities in the development process.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: cross-lingual transfer, less-resourced languages, endangered languages, indigenous languages, resources for less-resourced languages
Contribution Types: NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: Spanish, Mapudungun
Submission Number: 4966
Loading