Keywords: Large Language Model Pruning, Large Language Model Compression
Abstract: Extensive prior work on importance-based pruning relies on first- or second-order Taylor expansions of the loss to score parameters by the estimated loss increase upon removal. However, in large language models with massive parameters and multi-layered nonlinear mappings, such approximations inevitably lead to errors. When applied to structured pruning, Taylor-based criteria are typically extended from individual weights to entire neurons or channels by aggregating their sensitivities. While this enables parameter reduction at the structural level, Taylor expansion is constrained to low-order approximations, owing to the computational intractability of higher-order terms in large-scale models, which results in inaccurate estimates of loss change. Moreover, it neglects the hierarchical dependencies of deep models, failing to account for how parameters influence subsequent layers through forward propagation. In particular, the intermediate activations within the feed-forward network (FFN) layer provide a direct characterization of how the pre-activation projections transmits information forward, thereby offering a more faithful account of its influence on the model’s representations. Therefore, we propose $\textbf{ActTaylor}$, an intermediate $\textbf{act}$ivation enhanced $\textbf{Taylor}$ criterion for structured pruning, which integrates loss sensitivity with the hierarchical influence of parameters captured through intermediate activations. ActTaylor scores each hidden unit in the FFN by modulating its Taylor-based sensitivity with the activation statistics for one-shot pruning without any retraining. At pruning ratios of 20\% and 30\%, our method consistently outperforms state-of-the-art structured pruning baselines across seven commonsense benchmarks and one multi-task knowledge benchmark, improving the average accuracy on LLaMA-2 7B by $7.8\%$ and $12.9\%$, and on LLaMA-2 13B by $12.5\%$ and $14.0\%$, respectively.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10696
Loading