How Sparse Can We Prune A Deep Network: A Geometric Viewpoint

Qiaozhe Zhang; Ruijie ZHANG; Jun Sun; Yingzhuang Liu

How Sparse Can We Prune A Deep Network: A Geometric Viewpoint

Qiaozhe Zhang, Ruijie ZHANG, Jun Sun, Yingzhuang Liu

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Pruning, Statistical Dimension, High Dimension Geometry

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Network pruning constitutes an effective measure to alleviate the storage and computational burden of deep neural networks which arises from its overparameterization. A fundamental question is: How sparse can we prune a deep network without sacrifice on the performance? To address this problem, in this work we take a first principles approach, specifically, by directly enforcing the sparsity constraint on the original loss function and exploiting the universal \textit{concentration} effect in the high-dimensional world, we're able to characterize the sharp phase transition point of pruning ratio, which turns out to equal one minus the normalized squared Gaussian width of a convex set determined by the $l_1$-regularized loss function. Meanwhile, we provide efficient countermeasures to address the challenges in computing the involved Gaussian width, including the spectrum estimation of a large-scale Hessian matrix and dealing with the non-definite positiveness of a Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we're able to identify the key factors that impact the pruning performance, thus providing intuitive explanations on many phenomena of existing pruning algorithms. Extensive experiments are performed which demonstrate that the theoretical pruning ratio threshold coincides very well with the experimental one. All codes are available at: \url{https://anonymous.4open.science/r/Global-One-shot-Pruning-BC7B/}

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: pdf

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7446

Loading