Targeted Low-rank Refinement: Enhancing Sparse Neural Networks with Precision

Li Shen; Anke Tang; Xiaoguang Ren; Yong Luo; Han Hu; Xiaochun Cao

Targeted Low-rank Refinement: Enhancing Sparse Neural Networks with Precision

Li Shen, Anke Tang, Xiaoguang Ren, Yong Luo, Han Hu, Xiaochun Cao

24 Sept 2024 (modified: 03 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Compression, Low-Rank Refinement, Model Pruning

TL;DR: We propose an iterative method for refining pruned neural network weights, aiming to improve model performance while maintaining sparsity

Abstract: Pruning is a widely used technique for compressing large neural networks that eliminate weights that have minimal impact on the model's performance. Current pruning methods, exemplified by magnitude pruning, assign an importance score to each weight based on its magnitude and remove weights with scores below a certain threshold. Nonetheless, these methods often create a gap between the original dense and the pruned sparse model, potentially impairing performance. Especially when the sparsity ratio is high, the gap becomes more pronounced. To mitigate this issue, we introduce to bridge the gap left by pruning by utilizing a low-rank approximation of the difference between the dense and sparse matrices. Our method specifically entails the iterative refinement of the sparse weight matrix, augmented by a low-rank adjustment. This technique captures and retains the essential information often lost during pruning, thereby improving the performance of the pruned model. Furthermore, we offer a comprehensive theoretical analysis of our approach, emphasizing its convergence properties and establishing a solid basis for its efficacy. Experimental results on LLaMa models validate its effectiveness on large language models across various pruning techniques and sparsity levels. Our method shows significant improvements: at 50\% sparsity, it reduces perplexity by 53.9\% compared to conventional magnitude pruning on LLaMa-7B.Furthermore, to achieve a specific performance target, our approach enables an 8.6\% reduction in model parameters while maintaining a sparsity ratio of about 50\%.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3363

Loading