On the Predictability of Pruning Across Scales

Jonathan S Rosenfeld; Jonathan Frankle; Michael Carbin; Nir Shavit

On the Predictability of Pruning Across Scales

Jonathan S Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: neural networks, deep learning, generalization error, scaling, scalability, pruning

Abstract: We show that the error of iteratively-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing that it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different sparsities are freely interchangeable. We demonstrate the accuracy of this functional approximation over scales spanning orders of magnitude in depth, width, dataset size, and sparsity. We show that the scaling law functional form holds (generalizes) for large scale data (CIFAR-10, ImageNet), architectures (ResNets, VGGs) and iterative pruning algorithms (IMP, SynFlow). As neural networks become ever larger and more expensive to train, our findings suggest a framework for reasoning conceptually and analytically about pruning.

One-sentence Summary: We show pruning generalization error is predictable, and specify the scaling law predicting it across scales, empirically.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/on-the-predictability-of-pruning-across/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=zOAhNh8HKw

12 Replies

Loading