UniPruning: Unifying Local Metric and Global Feedback for Scalable Sparse LLMs

Yizhuo Ding; Wanying Qu; Jiawei Geng; Tao Zhang; Wenqi Shao; Yanwei Fu

UniPruning: Unifying Local Metric and Global Feedback for Scalable Sparse LLMs

Yizhuo Ding, Wanying Qu, Jiawei Geng, Tao Zhang, Wenqi Shao, Yanwei Fu

10 Sept 2025 (modified: 24 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM pruning

Abstract: Large Language Models (LLMs) achieve strong performance across diverse tasks but face prohibitive computational and memory costs. Pruning offers a promising path by inducing sparsity while preserving architectural flexibility. However, existing methods struggle to balance efficiency and robustness: local metric approaches prune layer by layer but often collapse under high sparsity, whereas global feedback methods enforce consistency at the cost of expensive weight updates or restrictive semi-structured formats. We present \textbf{UniPruning}, a unified post-training pruning framework that combines the speed of local saliency metrics with the stability of global coordination, enabled by a mirror descent based optimization, all \textbf{without updating model weights}. UniPruning leverages fast layer-wise scoring and a lightweight global controller to allocate a single sparsity budget, supporting both unstructured and semi-structured $N{:}M$ pruning within one framework. After a brief calibration, it can generate pruning masks for arbitrary sparsity levels in one shot, and adapts seamlessly to hardware-aware constraints. Extensive experiments on multiple pretrained LLM families and standard benchmarks show that UniPruning consistently delivers competitive or superior perplexity and zero-shot accuracy. Ablation studies further highlight the importance of mirror descent and local saliency anchoring. Overall, UniPruning provides an efficient, principled, and scalable solution for sparsifying large-scale LLMs.We will release the code in the future.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 3692

Loading