Revisiting One-Shot Pruning with Scalable Second-Order Approximations

Ivo Gollini Navarrete; Nicolas Mauricio Cuadrado; Martin Takáč; Samuel Horváth

Revisiting One-Shot Pruning with Scalable Second-Order Approximations

Ivo Gollini Navarrete, Nicolas Mauricio Cuadrado, Martin Takáč, Samuel Horváth

04 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Pruning, One-shot, Initialization, Hessian, Hutchinson, Fisher

TL;DR: Scalable second-order estimates improve pruning across datasets, models, and sparsity regimes.

Abstract: Pruning is a practical approach to mitigate the associated costs and environmental impact of deploying large neural networks (NNs). Early works, such as OBD \citep{lecun1989optimal} and OBS \citep{hassibi1992second}, utilize the Hessian matrix to improve the trade-off between network complexity and performance, demonstrating that second-order information is valuable for pruning. However, the computation and storage of the Hessian matrix are infeasible for modern NNs, motivating the use of approximations. In this work, we revisit one-shot pruning at initialization (PaI) and examine scalable second-order approximations. We focus on unbiased estimators, such as the Empirical Fisher and the Hutchinson diagonal, that capture enough curvature information to improve the identification of structurally important parameters while keeping the linear computational overhead. Across extensive experiments on CIFAR-10/100 and TinyImagenet with ResNet and VGG architectures, we show that incorporating even coarse second-order information consistently improves pruning outcomes compared to first-order methods like SNIP and Hessian-vector product approaches like GraSP. We also analyze the problem of \textit{layer collapse}, a significant limitation of \textit{data-dependent} pruning methodologies, and demonstrate that simply updating the batch-norm statistics mitigates this problem. Notably, this warm-up phase substantially boosts the performance of the Hutchinson diagonal approximation in high sparsities, allowing it to surpass magnitude pruning after training (PaT), providing insight to possibly break through a long-standing wall for PaI methods \citep{frankle2020pruning} and narrow the performance gap between PaI and PaT. Our results suggest that scalable second-order approximations effectively balance computational efficiency and accuracy, making them a valuable component of the pruning toolkit.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 1916

Loading