Intragroup sparsity for efficient inference

Zilin Yu; Chao Wang; Xin Wang; Yong Zhao; Xundong Wu

Intragroup sparsity for efficient inference

Zilin Yu, Chao Wang, Xin Wang, Yong Zhao, Xundong Wu

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Deep Learning, Model compression, Neural Network Pruning, High Performance Computation

Abstract: This work studies intragroup sparsity, a fine-grained structural constraint on network weight parameters. It eliminates the computational inefficiency of fine-grained sparsity due to irregular dataflow, while at the same time achieving high inference accuracy. We present theoretical analysis on how weight group sizes affect sparsification error, and on how the performance of pruned networks changes with sparsity level. Further, we analyze inference-time I/O cost of two different strategies for achieving intragroup sparsity and how the choice of strategies affect I/O cost under mild assumptions on accelerator architecture. Moreover, we present a novel training algorithm that yield models of improved accuracies over the standard training approach under the intragroup sparsity constraint.

One-sentence Summary: We perform theoretical analysis on how intragroup sparsity affects model performance, and present a training algorithm to produce sparse models with state-of-the-art conputational efficency.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=YCsynNwl2A

5 Replies

Loading