Data-adaptive Transfer Learning for Low-resource Translation: A Case Study in HaitianDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that beyond a threshold of authentic data, back-translation augmentation methods are counterproductive, while cross-lingual transfer during training is preferred. We complement this finding by contributing a rule-based French-Haitian orthographic and syntactic engine and a novel method for phonological embedding. When used with multilingual techniques, orthographic transformation significantly improves performance over conventional methods, and phonological transfer greatly improves performance in Jamaican MT.
0 Replies
