Lang-Prune: Unlocking Fair and Powerful Pruning for Multilingual Large Language Models

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Pruning, Multilingual Large Language Model, Model Compression
TL;DR: Lang-Prune is a multilingual pruning framework that mitigates cross-lingual interference, enabling fair and effective structured and unstructured compression of LLMs.
Abstract: Multilingual large language models (LLMs) are essential for cross-lingual applications, yet pruning them using mixed-language calibration can induce cross-lingual interference, disproportionately affecting certain languages. We introduce \textbf{\textit{Lang-Prune}}, a drop-in, language-aware extension to structured pruning that computes per-language importance on small calibration sets and aggregates it to protect units critical to any language. Evaluated on \texttt{aya-expanse-8b} across nine languages and multiple sparsity levels, Lang-Prune consistently improves both average and worst-case performance. At 70\% sparsity, it reduces average perplexity from 188.49 (original pruning method) to 70.85, surpassing the monolingual baseline (83.08) while lowering the worst-language error. Interpretability analyses reveal higher retention of language-specific capacity (81\% vs 66\%). Ablations demonstrate robustness across model types (e.g., \texttt{Qwen3-8B}), improved post-training headroom, and strong transfer to out-of-distribution languages. Lang-Prune is compute-efficient and deployment-friendly, requiring only modifications to importance estimation and aggregation while preserving LLM-Pruner’s coupled-structure mechanics.
Primary Area: interpretability and explainable AI
Submission Number: 12099
Loading