Balancing training time vs. performance with Bayesian Early Pruning

Mohit Rajpal; Yehong Zhang; Bryan Kian Hsiang Low

Balancing training time vs. performance with Bayesian Early Pruning

Mohit Rajpal, Yehong Zhang, Bryan Kian Hsiang Low

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Efficient Training, Multi-Output Gaussian Process, Gaussian Process, Bayesian, Single-shot network pruning, Dynamic Sparse Reparameterization, Lottery Ticket Hypothesis

Abstract: Pruning is an approach to alleviate overparameterization of deep neural networks (DNN) by zeroing out or pruning DNN elements with little to no efficacy at a given task. In contrast to related works that do pruning before or after training, this paper presents a novel method to perform early pruning of DNN elements (e.g., neurons or convolutional filters) during the training process while preserving performance upon convergence. To achieve this, we model the future efficacy of DNN elements in a Bayesian manner conditioned upon efficacy data collected during the training and prune DNN elements which are predicted to have low efficacy after training completion. Empirical evaluations show that the proposed Bayesian early pruning improves the computational efficiency of DNN training with small sacrifices in performance. Using our approach we are able to achieve a $48.6\%$ faster training time for ResNet-$50$ on ImageNet to achieve a validation accuracy of $72.5\%$.

One-sentence Summary: Our work improves the training efficiency of deep neural networks while minimizing degradation in performance through optimizing the tradeoff between training time and performance by pruning ineffectual network elements during the training process.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=5eaDKqsIij

14 Replies

Loading