everyone
since 09 May 2025">EveryoneRevisionsBibTeXCC BY 4.0
Cross-lingual transfer learning has shown promise for low-resource translation, but its effectiveness for extremely low-resource languages, such as indigenous and ancient languages, remains under-explored. This limitation stems from a circular challenge: insufficient data and limited understanding of linguistic features and grammar prevent a thorough analysis, which in turn hinders the development of effective methods. This paper identifies key challenges in this domain and introduces a novel analysis technique, \textbf{UNMUTE} (\textbf{Un}derstanding \textbf{MU}ltilingual \textbf{T}ransferability through \textbf{E}ncipherment), which enciphers well-studied and high-resource text to simulate the challenges posed by extremely low-resource languages. Our framework enables us to systematically and precisely study factors such as training data amount and the proportion of unseen characters or (sub)words. Using UNMUTE, we investigate the techniques that enable and constrain effective transfer learning for extremely low-resource machine translation.