HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

Tianyi Chen; Xiaoyi Qu; David Aponte; Colby Banbury; Jongwoo Ko; Tianyu Ding; Yong Ma; Vladimir Lyapunov; Kazuhito Koishida; Ilya Zharkov; Luming Liang

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

Tianyi Chen, Xiaoyi Qu, David Aponte, Colby Banbury, Jongwoo Ko, Tianyu Ding, Yong Ma, Vladimir Lyapunov, Kazuhito Koishida, Ilya Zharkov, Luming Liang

23 Sept 2024 (modified: 02 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Structured Pruning, Training, AutoML

Abstract: Structured pruning is one of the most popular approaches to effectively compress the heavy deep neural networks (DNNs) into compact sub-networks while retaining the original network performance. The existing methods suffer from multi-stage procedures along with significant engineering efforts and human expertise. The Only-Train-Once series (OTOv1-v3) has been recently proposed to resolve the many pain points by streamlining the workflow. However, the built-in sparse optimizers in the OTO series, i.e., the Half-Space Projected Gradient (HSPG) family, have limitations that require hyper-parameter tuning and the implicit controls of the sparsity exploration, consequently requires intervening by human expertise. To further address such limitations, we propose a novel Hybrid Efficient Structured Sparse Optimizer (HESSO). HESSO could automatically and efficiently train a DNN within a single run to produce a high-performing sub-network. Meanwhile, it is almost tuning-free and enjoys user-friendly integration for generic training applications. To address another common issue of irreversible pruning performance collapse observed in some DNNs, we further propose a novel Corrective Redundant Identification Cycle (CRIC) to plug into HESSO for reliably identifying indispensable structures. We numerically demonstrate the efficacy of HESSO and its enhanced version HESSO-CRIC on a variety of applications ranging from computer vision to natural language processing, including large language model. The numerical results showcase that HESSO can achieve competitive performance to varying state-of-the-art benchmarks and support most DNN architectures. Meanwhile, CRIC can effectively prevent the irreversible performance collapse and further enhance the performance of HESSO on certain applications.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3237

Loading