Towards Efficient Adaptation of Pruning Strategy in Large Language Models

Shuqi LIU; Bowei He; Han Wu; Linqi Song

Towards Efficient Adaptation of Pruning Strategy in Large Language Models

Shuqi LIU, Bowei He, Han Wu, Linqi Song

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Pruning, Large Language Model

TL;DR: We propose an efficient optimization framework to tackle the essential adaptive pruning in LLMs.

Abstract: Post-training pruning has gained increasing attention with the rapid growth of large language models (LLMs). However, significant variations in weight distributions across different LLMs make a fixed pruning strategy inadequate for multiple models. In this paper, we propose an efficient evolutionary optimization framework, \textbf{Mecon}, for adaptive LLM pruning. In particular, we design an effective search space built on our \textbf{Me}ta pruning metric to mitigate diverse weight distributions among LLMs. We then introduce model-wise re\textbf{con}struction error, a lightweight search evaluation to speed up the evaluation of each search trial. We finally leverage Non-dominated Sorting Genetic Algorithm III (NSGA-III) as our search algorithm, handling both the single-objective problem of pruning metric search and the multi-objective problem of layerwise sparsity ratio search in discovering the optimal pruning strategy. We extensively evaluate our framework on LLaMA-1/2/3 and Mistral models across multiple benchmarks. Our results demonstrate that our adaptive pruning metrics consistently outperform existing ones, and the layerwise sparsity ratios improve the effectiveness of other pruning metrics. Furthermore, we validate the cross-task and cross-model generalizability of our pruning metrics, offering a cost-effective solution to streamline the search process. We release our code in the anonymous repository: \textcolor{blue}{\url{https://anonymous.4open.science/r/Mecon-5819}}.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6214

Loading