UNMUTE: Understanding Multilingual Transfer Learning through Encipherment

ACL ARR 2025 February Submission746 Authors

11 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Cross-lingual transfer learning has shown promise for low-resource translation, but its effectiveness for extremely low-resource languages, such as indigenous and ancient languages, remains under-explored. This limitation stems from a circular challenge: insufficient data and limited understanding of linguistic features and grammar prevent a thorough analysis, which in turn hinders the development of effective methods. This paper identifies key challenges in this domain and introduces a novel analysis technique, \textbf{UNMUTE} (\textbf{Un}derstanding \textbf{MU}ltilingual \textbf{T}ransferability through \textbf{E}ncipherment), which enciphers well-studied and high-resource text to simulate the challenges posed by extremely low-resource languages. Our framework enables us to systematically and precisely study factors such as training data amount and the proportion of unseen characters or (sub)words. Using UNMUTE, we investigate the techniques that enable and constrain effective transfer learning for extremely low-resource machine translation.

Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: cross-lingual transfer,indigenous languages,endangered languages
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: ancient language,akkadian,amis
Submission Number: 746
Loading