Abstract: Pruning techniques have been studied to construct small models for efficiency, yet the effect of cross-lingual, which shows language performance transferability, is understudied in this field. In this work, we investigate cross-lingual effects in multilingual large language model compression using iterative pruning and recovery. We employ structured layer pruning with LoRA-based recovery and knowledge distillation, testing whether calibration languages different from target evaluation languages can preserve multilingual performance. Experiments on Qwen2.5-7B and Llama3.1-8B demonstrate that any recovery language consistently outperforms no-recovery baselines, with even low-resource languages like Swahili providing ~5\% improvements. In contrast to expectations, dominant pretraining languages do not always yield the best results, where Indonesian achieves the highest performance in Llama3.1-8B, while Japanese performs the best in Qwen2.5-7B. Our findings reveal that cross-lingual calibration effectively maintains multilingual capabilities in the iterative pruning.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: multilingualism,cross-lingual transfer, pruning,distillation,NLP in resource-constrained settings
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: zh,ru,id,en,es,ar,hi,ja,vi,sw,ar,de,el,hi,ro,th,tr,vi
Submission Number: 1173
Loading