Extrapolating Large Language Models to Non-English by Aligning Languages

Anonymous

Extrapolating Large Language Models to Non-English by Aligning Languages

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

Abstract: Existing large language models (LLM) show disparate capability across different languages. Their performances on non-English tasks are often much worse than on English tasks. In this paper, we explore to extrapolate LLM's English ability to non-English by building semantic alignment across languages. We start from targeting individual languages by performing bilingual multi-task instruction-tuning, i.e. tuning LLM with bilingual translation task and bilingual instruction-following task. Then we formulate underlying scaling laws to quantify the impact of scaling up translation data and providing insights for devising multilingual instruction-tuning strategies, e.g., optimizing multilingual data allocation. Experiment results show that our alignment-enhanced LLMs significantly outperforms the English-dominated instruction-tuned counterpart on both translation task and other zero-shot non-English tasks, e.g., question answering, knowledge infilling and summarization. Our optimized data allocation also assists LLM in achieving better multilingual performance compared to uniform allocation. Further analysis on representation space and response content reveals additional evidence of the established language alignment.

Paper Type: long

Research Area: Multilinguality and Language Diversity

Contribution Types: NLP engineering experiment

Languages Studied: English, Arabic, Greek, Hindi, Turkish, Vietnamese, Chinese

0 Replies

Loading