Pushing Gradient towards Zero: A Novel Pruning Method for Large Language Models

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: zip
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: large language models, model prune, gradient, sparsity
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recently, large language models (LLMs) have attracted widespread attention due to their dominating performance on some complex language modelling tasks. However, because of their massive size, LLMs require huge amounts of GPU resources in inference which limits their usability. In this paper, we propose an effective pruning method termed PGZ(Pushing Gradient towards Zero), which prunes LLMs in one-shot, without any retraining. The method consists of a new gradual pruning method and a novel fine-tuning method where gradient is pushed towards zero. More precisely, we construct a loss function based on gradient information and optimize it leveraging second-order information implicitly. In addition, the inherently nature of PGZ makes it suitable for parallelization. Notably, we conduct a thorough evaluation of PGZ on LLaMA-7B,13B,30B,65B across various language benchmarks. Experimental results demonstrate that PGZ consistently outperforms the existing pruning methods for LLMs in unstructured pattern and semi-structured (2:4 and 4:8) pattern. PGZ is also competitive in terms of zero-shot tasks and is compatible with weight quantization approaches.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7856
Loading