Fast and Effective Weight Update for Pruned Large Language Models

Published: 23 Jul 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We reworked sections 2.3 and 3 and added formal statements of theoretical claims. We added a short subsection about explicit differences between our approach and SparseGPT/Wanda. We replaced word "accurate" with word "effective" in the title. We also updated abstract and introduction.
Code: https://github.com/fmfi-compbio/admm-pruning
Supplementary Material: zip
Assigned Action Editor: ~Gintare_Karolina_Dziugaite1
Submission Number: 2631
Loading