MoreauPruner: Robust Structured Pruning of Large Language Models against Weight Perturbations

Zixiao Wang; Jingwei Zhang; Wenqian Zhao; Farzan Farnia; Bei Yu

MoreauPruner: Robust Structured Pruning of Large Language Models against Weight Perturbations

Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu

14 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Structured Pruning, Large Language Model, Robustness, Moreau Envelop

Abstract: In the existing model pruning literature, the weight gradient has been extensively utilized to measure the importance of weight, where the gradient is well-known to be sensitive to perturbations. On the other hand, the widely used large language models (LLMs) have several billion model parameters, which could increase the fragility of few-shot gradient pruning. In this work, we experimentally show that one-shot gradient pruning algorithms could lead to unstable results under perturbations to model weights. Even the minor error of switching between data formats bfloat16 and float16 could result in obviously different outcomes. To address such instabilities, we leverage optimization analysis and propose an LLM structural pruning method, called MoreauPruner, with provable robustness against weight perturbations. In MoreauPruner, the model weight importance is estimated based on the neural network's Moreau envelope, which can be flexibly combined with $\ell_1$-norm regularization techniques to induce the sparsity required in the pruning task. We extensively evaluate the MoreauPruner algorithm on several well-known LLMs, including LLaMA-7B, LLaMA-13B, LLaMA3-8B, and Vicuna-7B. Our numerical results suggest the robustness of MoreauPruner against weight perturbation and how robust importance estimation in MoreauPruner contributes to successful accuracy-based scores compared to several existing pruning methods.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 650

Loading