Scaling Cross-lingual Transfer via Continual Pre-trainingDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large Language models(LLMs) have developed rapidly in recent years, and are remarked as a brand new milestone in the information age. However, powerful LLMs in vogue are predominantly mainstream language speakers, especially in English, and the desire for native LLMs remains strong. Inspired by these demands, we examine the scaling of multilingual models, focusing on the interplay between language-specific computational requirements and universal scaling laws. Our findings demonstrate that continual pre-training of an other-language model on an English base effectively maintains English proficiency while improving other-language performance, challenging traditional notions of cross-lingual transfer, which is commonly equated with fine-tuning. We propose a strategic approach for efficient multilingual training, emphasizing the balance between computational resource allocation and avoiding catastrophic forgetting. Our work helps to understand language-independent model scaling behaviors and transform “outsiders” into “locals” with basic capacities mostly preserved.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment, Theory
Languages Studied: English,Chinese,French,Russian
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview