Abstract: Existing deep neural network (DNN) pruning methods can be classified into two main categories: structured pruning and weight pruning. Structured pruning is a representative model compression technology of DNN to reduce the storage and computation requirements and accelerate inference, which mainly includes filter pruning and channel pruning. However, they both belong to coarse-grained methods, which can only decide whether to prune a whole filter or channel or not and provide limited decision space. On the other hand, structured stripe-wise pruning has finer granularity than filter pruning, and shape-wise pruning also has finer granularity than channel pruning. These two fine-grained methods are related to two dimensions: rows and columns from the general matrix multiplication (GEMM) perspective of convolution operations. Considering that combining pruning decisions in finer granularity from multiple dimensions will produce a larger solution space, in this paper we propose a joint multi-dimensional fine-grained pruning scheme (JFP) for DNN compression, which simultaneously prune elements in filters and channels. Extensive experiments on the CIFAR-10 dataset demonstrate that: (1) JFP achieves stabler pruning ratios compared to stripe-wise pruning (2) JFP effectively compresses DNN parameters and reduces calculation amount while maintaining the accuracy compared with counterparts.
Loading