NeuroSlice: Forward Selection-Based LLM Pruning via Neuron Contribution Decomposition

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, structured pruning, neuron contribution decomposition
Abstract: Large language models (LLMs) have dramatically advanced natural language processing, but their deployment is often hindered by exorbitant computational and memory demands. LLM pruning offers a promising pathway to efficiency, yet most pruning methods rely on the layer output as the signal for parameter importance estimation. In this work, we revisit this issue and demonstrate that the layer output is not an atomic unit. Leveraging matrix identity transformations, we decompose each layer's output into an additive summation of individual neuron contributions, thereby reshaping the original token-by-feature tensor into a more granular token-by-feature-by-neuron representation. This decomposition yields much richer pruning signals by explicitly quantifying each neuron's individual contribution to the original layer output, enabling us to convert structured pruning as an neuron subset selection problem. To further optimize pruning ratio allocation, we introduce a layer-adaptive sparsity assignment method that dynamically allocate the global pruning ratio based on empirical reconstruction gains across layers. Our empirical analysis uncovers intriguing insights into depth-wise and module-wise redundancy patterns, offering actionable insights for future LLM pruning designs. Through comprehensive experiments on diverse LLM benchmarks, we show that our proposed NeuroSlice method consistently surpasses state-of-the-art structured pruning baselines.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 23861
Loading