Loss-Aligned Structured Pruning for Large Language Models

Loss-Aligned Structured Pruning for Large Language Models

ICLR 2026 Conference Submission18212 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: uncertainty, large language model, structured pruning

TL;DR: We propose a layer-wise, loss-aligned metric to prune units in neural networks while minimizing performance degradation.

Abstract: Recent advances in large language models (LLMs) have achieved remarkable performance across diverse tasks, yet their increasing size poses significant storage and computational challenges. Model compression, particularly pruning, has emerged as a crucial strategy to reduce memory footprint and computation while preserving predictive performance. In this work, we present LASP, a Loss-Aligned Structured Pruning method that evaluates the contribution of individual model units, such as neurons and attention heads, to the overall performance, subsequently removing those deemed to be of low importance. By combining the activation magnitudes of model units with their gradients with respect to the loss, LASP defines an importance metric that is directly aligned with the model’s objective, thereby ensuring the preservation of performance. To mitigate uncertainty caused by the limited calibration dataset used for importance estimation, LASP incorporates the Upper Confidence Bound (UCB) strategy, refining the selection of low-importance units. In implementation, LASP leverages a moving average to maintain running statistics and reduce storage overhead. Empirical results across diverse LLMs and benchmarks demonstrate that LASP outperforms state-of-the-art baselines, effectively balancing efficiency and performance, thus enabling the practical deployment of LLMs.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 18212

Loading