Pruning Neural Networks at Initialization: Why Are We Missing the Mark?

Jonathan Frankle; Gintare Karolina Dziugaite; Daniel Roy; Michael Carbin

Pruning Neural Networks at Initialization: Why Are We Missing the Mark?

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: Pruning, Sparsity, Lottery Ticket, Science

Abstract: Recent work has explored the possibility of pruning neural networks at initialization. We assess proposals for doing so: SNIP (Lee et al., 2019), GraSP (Wang et al., 2020), SynFlow (Tanaka et al., 2020), and magnitude pruning. Although these methods surpass the trivial baseline of random pruning, they remain below the accuracy of magnitude pruning after training, and we endeavor to understand why. We show that, unlike pruning after training, randomly shuffling the weights these methods prune within each layer or sampling new initial values preserves or improves accuracy. As such, the per-weight pruning decisions made by these methods can be replaced by a per-layer choice of the fraction of weights to prune. This property suggests broader challenges with the underlying pruning heuristics, the desire to prune at initialization, or both.

One-sentence Summary: Methods for pruning neural nets at initialization perform the same or better when shuffling or reinitializing the weights they prune in each layer, a way in which they differ from SOTA weight-pruning methods after training.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/pruning-neural-networks-at-initialization-why/code)

14 Replies

Loading