Language Fusion for Parameter-Efficient Cross-lingual Transfer

Language Fusion for Parameter-Efficient Cross-lingual Transfer

ACL ARR 2024 June Submission1738 Authors

14 Jun 2024 (modified: 18 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Limited availability of multilingual text corpora for training language models often leads to poor performance on downstream tasks due to undertrained representation spaces for languages other than English. This 'under-representation' has motivated recent cross-lingual transfer methods to leverage the English representation space by e.g. mixing English and non-English tokens at input or extending model parameters, which in turn increases computational complexity. To address this, we introduce Fusion for Language Representations (FLARE) in adapters, a method designed to improve both the representation quality and downstream performance for languages other than English. FLARE integrates source and target language representations within the bottlenecks of low-rank LoRA adapters using lightweight linear transformations. This maintains parameter efficiency as the method does not require additional parameters, while improving transfer performance, further narrowing the performance gap to English. Another key advantage of the proposed latent representation fusion is that it does not increase the number of input tokens, thus maintaining computational efficiency. Moreover, FLARE provides flexibility to integrate various types of representations, e.g., we show that it is possible to fuse latent translations extracted from machine translation models. Our results demonstrate FLARE's effectiveness on natural language understanding tasks, reducing the performance gap to English across all tasks.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: cross-lingual transfer, multilingual representations, less-resourced languages

Contribution Types: NLP engineering experiment

Languages Studied: Acehnese, Arabic, Balinese, Banjarese, Bengali, Buginese, Bulgarian, Chinese, Finnish, French, German, Greek, Hindi, Indonesian, Javanese, Korean, Madurese, Minangkabau, Ngaju, Russian, Spanish, Swahili, Telugu, Thai, Turkish, Urdu, Vietnamese

Submission Number: 1738

Loading