Keywords: Large Language Models, Model Compression
TL;DR: a reinforcement learning-based pruning technique for LLMs, strong performance across various pruning granularities
Abstract: Large Language Models (LLMs) have achieved significant success in the field of Natural Language Processing (NLP). However, due to their large model size and high inference costs, the application of LLMs is restricted. Pruning is regarded as an effective method to reduce the size of LLMs. Mainstream pruning methods for LLMs typically apply a uniform ratio to prune all the layers or determine layerwise sparsity based on simple criteria. Such manually or semi-manually designed pruning strategies often lead to suboptimal results, which makes reinforcement learning a feasible solution. However, current reinforcement learning-based pruning methods usually have redundant environment designs or multiple agents, rendering them ill-suited to massive LLMs. Hence, we propose SparsitySolver, which first incorporates reinforcement learning into the pruning of LLMs, supporting various pruning granularity. SparsitySolver employs an improved reinforcement learning environment, allowing for a rapid pruning strategy search with a small-scale agent. Moreover, to lessen the performance decline caused by structured pruning, we propose a compensation method capable of restoring performance without introducing additional parameters to the model. We evaluate our approach on LLaMA-V1/V2, Mistral, and the OPT families across multiple pruning granularities, achieving performances surpassing the state-of-the-art methods.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5486
Loading