Beyond Magnitude and Gradient: Network Pruning Inspired by Optimization Trajectories

TMLR Paper5393 Authors

16 Jul 2025 (modified: 16 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep neural networks are dramatically over-parameterized and can be pruned without effecting the generalization. Existing pruning criteria inspect weights or gradients in isolation and ignore the effect of optimization dynamics on pruning. We introduce Causal Pruning (CP) -- A method by which one learns the parameter-importance from the optimization trajectory directly. We exploit the causal signal hidden in SGD trajectories, where each weight update is considered as an intervention and measuring its effect on the loss -- observed versus predicted. This view yields two insights: (i) a weight’s importance is proportional to the gap between the predicted loss change (via a first-order Taylor estimate) and the observed loss change, and (ii) at convergence, weights whose removal leaves the local basin no sharper -- i.e. does not reduce flatness -- can be pruned without harming generalization. Empirically, we show that causal pruning is comparable to recent state-of-the-art approaches.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~William_T_Redman1
Submission Number: 5393
Loading