Flexible Group-Level Pruning of Deep Neural Networks for Fast Inference on Mobile GPUsDownload PDFOpen Website

Published: 01 Jan 2019, Last Modified: 12 May 2023CASES (work in progress) 2019Readers: Everyone
Abstract: Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.
0 Replies

Loading