CUP: Cluster Pruning for Compressing Deep Neural Networks

Rahul Duggal, Cao Xiao, Richard W. Vuduc, Duen Horng Chau, Jimeng Sun

2021 (modified: 05 Nov 2022)IEEE BigData 2021Readers: Everyone

Abstract: We propose CUP, a new method for compressing and accelerating deep neural networks. At its core, CUP achieves compression by clustering and pruning similar filters in each layer. For clustering, CUP uses hierarchical clustering which allows for an elegant parameterization of model capacity through a single hyper-parameter t. We observe that by increasing t, CUP can dynamically reduce model capacity through non-uniform layer-wise pruning leading to two advantages. First, CUP can effectively compress a model to within the desired compute budget through a simple line-search on t. Second, through a simple extension, CUP can obtain the pruned model in a single training pass leading to large savings in training time. On Imagenet, CUP leads to a 2.47× FLOPS reduction on Resnet-50 with less than 1% drop in top-5 accuracy. Notably, in the retrain-free setting, CUP-RF saves over 10 hours of training time on 3 GPUs, in comparison to state-of-the-art methods. The code for CUP is open sourced <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> .

0 Replies