Abstract: Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. In this paper, we propose a novel group-level pruning method to accelerate deep neural networks on mobile GPUs, where several adjacent weights are pruned in a group while providing high accuracy. Although several group-level pruning techniques have been proposed, the previous techniques can not achieve the desired accuracy at high sparsity. In this paper, we propose a unaligned approach to improve the accuracy of compressed model.
0 Replies
Loading