Why Does Pruning during Training Work? A Signal-to-Noise Analysis of Sparse Neural Network Training

Published: 26 May 2026, Last Modified: 02 Jun 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sparse training;pruned model
TL;DR: Our paper theoretically analyzes training-time pruning for two-layer ReLU CNNs via signal-to-noise decomposition, deriving conditions for sparse networks to match or surpass dense generalization, with experimental validation.
Abstract: Network pruning is an effective strategy to alleviate the computational burden of large neural networks for both training and inference, substantially reducing storage and computational costs while preserving predictive capacity. Despite its strong empirical success, the theoretical understanding of pruning during training remains limited, even in simple settings such as two-layer $\mathrm{ReLU}$ convolutional neural networks. As a result, the lack of a sufficient understanding makes it unclear when and why training-time pruning is effective, obscuring their underlying value. In this work, we provide a theoretical analysis of pruning during training for practical two-layer $\mathrm{ReLU}$ CNNs using a signal-to-noise decomposition framework. We characterize the learning dynamics induced by dynamic pruning and establish conditions under which sparse pruned networks achieve generalization performance comparable to, or even strictly better than, their dense counterparts. To the best of our knowledge, this work presents the first theoretical study of training-time pruning in $\mathrm{ReLU}$ neural networks, and our theoretical predictions are further validated through experiments.
Submission Number: 69
Loading