Pruning General Large Language Models into Customized Expert Models

ACL ARR 2025 February Submission6540 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have transformed natural language processing, yet their substantial model sizes often demand significant computational resources. To preserve computing resources and accelerate inference speed, it is crucial to prune redundant parameters, especially for experienced users who often need expert models tailored to specific downstream scenarios. However, current pruning methods primarily focus on maintaining models' general capabilities, either requiring extensive post-training or performing poorly due to coarse-grained pruning. In this work, we design a $\underline{Cus}$tom $\underline{Prun}$ing method ($\texttt{Cus-Prun}$) to prune a large general model into a smaller lightweight expert model, which is positioned along the "language", "domain" and "task" dimensions. By identifying and pruning irrelevant neurons of each dimension, $\texttt{Cus-Prun}$ creates expert models without any post-training. Our experiments demonstrate that $\texttt{Cus-Prun}$ consistently outperforms other methods, achieving minimal loss in both expert and general capabilities across various models from different model families and sizes.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: Large Language Models, Efficiency, Pruning, Specific Neuron
Languages Studied: English, German, Chinese, Thai
Submission Number: 6540
Loading