Abstract: Large Language models(LLMs) have developed rapidly in recent years, and are remarked as a brand new milestone in the information age. However, powerful LLMs in vogue are predominantly mainstream language speakers, especially in English, and the desire for native LLMs remains strong. Inspired by these demands, we examine the scaling of multilingual models, focusing on the interplay between language-specific computational requirements and universal scaling laws. Our findings demonstrate that continual pre-training of an other-language model on an English base effectively maintains English proficiency while improving other-language performance, challenging traditional notions of cross-lingual transfer, which is commonly equated with fine-tuning. We propose a strategic approach for efficient multilingual training, emphasizing the balance between computational resource allocation and avoiding catastrophic forgetting. Our work helps to understand language-independent model scaling behaviors and transform “outsiders” into “locals” with basic capacities mostly preserved.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment, Theory
Languages Studied: English,Chinese,French,Russian
0 Replies
Loading