On the Predictability of Pruning Across ScalesDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: neural networks, deep learning, generalization error, scaling, scalability, pruning
Abstract: We show that the error of iteratively-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing that it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different sparsities are freely interchangeable. We demonstrate the accuracy of this functional approximation over scales spanning orders of magnitude in depth, width, dataset size, and sparsity. We show that the scaling law functional form holds (generalizes) for large scale data (CIFAR-10, ImageNet), architectures (ResNets, VGGs) and iterative pruning algorithms (IMP, SynFlow). As neural networks become ever larger and more expensive to train, our findings suggest a framework for reasoning conceptually and analytically about pruning.
One-sentence Summary: We show pruning generalization error is predictable, and specify the scaling law predicting it across scales, empirically.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2006.10621/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=zOAhNh8HKw
12 Replies

Loading