Semantic-aware Pruning of Large Language Models via Neuron Importance Explanation

Yang Ji; Ying Sun; Jin Li; Zhigaoyuan Wang; Ping Li; Zhefeng Wang; Yi ZHENG

Semantic-aware Pruning of Large Language Models via Neuron Importance Explanation

Yang Ji, Ying Sun, Jin Li, Zhigaoyuan Wang, Ping Li, Zhefeng Wang, Yi ZHENG

16 Sept 2025 (modified: 13 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model Compression; Large Language Models; Explainable ML;

Abstract: Large language models (LLMs) demonstrate unprecedented capabilities across diverse applications, yet their extensive parameterization creates substantial computational and memory requirements that hinder practical deployment. While structured pruning shows promise for LLM compression, existing methods use static masks that cannot adapt to different inputs, limiting performance across diverse tasks. In this work, we present \textsc{SeAP}, a novel semantic-aware structured pruning framework that adaptively identifies optimal masks based on input semantics at the pre-fill stage. Our framework features two key components: (1) an explainability-guided importance estimation that uniquely fuses local and global neuron importance to discover diverse representative mask patterns from calibration data's intrinsic characteristics, and (2) a lightweight router-based module through iterative refinement that efficiently assigns optimal masks for each input prompt. Experimental results on LLaMA-2/3, Qwen2, and Phi-2 demonstrate that \textsc{SeAP} outperforms state-of-the-art structured pruning methods across diverse language modeling and commonsense reasoning tasks, achieving competitive performance with reductions in memory and inference latency.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 7789

Loading