Multilingual Iterative Model Pruning: What Matters?

Multilingual Iterative Model Pruning: What Matters?

ACL ARR 2025 July Submission1173 Authors

29 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Pruning techniques have been studied to construct small models for efficiency, yet the effect of cross-lingual, which shows language performance transferability, is understudied in this field. In this work, we investigate cross-lingual effects in multilingual large language model compression using iterative pruning and recovery. We employ structured layer pruning with LoRA-based recovery and knowledge distillation, testing whether calibration languages different from target evaluation languages can preserve multilingual performance. Experiments on Qwen2.5-7B and Llama3.1-8B demonstrate that any recovery language consistently outperforms no-recovery baselines, with even low-resource languages like Swahili providing ~5\% improvements. In contrast to expectations, dominant pretraining languages do not always yield the best results, where Indonesian achieves the highest performance in Llama3.1-8B, while Japanese performs the best in Qwen2.5-7B. Our findings reveal that cross-lingual calibration effectively maintains multilingual capabilities in the iterative pruning.

Paper Type: Long

Research Area: Multilingualism and Cross-Lingual NLP

Research Area Keywords: multilingualism,cross-lingual transfer, pruning,distillation,NLP in resource-constrained settings

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: zh,ru,id,en,es,ar,hi,ja,vi,sw,ar,de,el,hi,ro,th,tr,vi

Submission Number: 1173

Loading