Scaling Cross-lingual Transfer via Continual Pre-training

Anonymous

Scaling Cross-lingual Transfer via Continual Pre-training

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Large Language models(LLMs) have developed rapidly in recent years, and are remarked as a brand new milestone in the information age. However, powerful LLMs in vogue are predominantly mainstream language speakers, especially in English, and the desire for native LLMs remains strong. Inspired by these demands, we examine the scaling of multilingual models, focusing on the interplay between language-specific computational requirements and universal scaling laws. Our findings demonstrate that continual pre-training of an other-language model on an English base effectively maintains English proficiency while improving other-language performance, challenging traditional notions of cross-lingual transfer, which is commonly equated with fine-tuning. We propose a strategic approach for efficient multilingual training, emphasizing the balance between computational resource allocation and avoiding catastrophic forgetting. Our work helps to understand language-independent model scaling behaviors and transform “outsiders” into “locals” with basic capacities mostly preserved.

Paper Type: long

Research Area: Machine Learning for NLP

Contribution Types: NLP engineering experiment, Theory

Languages Studied: English,Chinese,French,Russian

0 Replies

Loading