- TL;DR: A sampling-based filter pruning approach for convolutional neural networks exhibiting provable guarantees on the size and performance of the pruned network.
- Abstract: We present a provable, sampling-based approach for generating compact Convolutional Neural Networks (CNNs) by identifying and removing redundant filters from an over-parameterized network. Our algorithm uses a small batch of input data points to assign a saliency score for each filter and constructs an importance sampling distribution where filters that highly affect the output are sampled with correspondingly high probability. Unlike weight pruning approaches that lead to irregular sparsity patterns -- requiring specialized libraries or hardware to enable computational speedups -- our approach compresses the original network to a slimmer subnetwork, which enables accelerated inference with any off-the-shelf deep learning library and hardware. Existing filter pruning methods are generally data-oblivious, rely on heuristics for evaluating the parameter importance, or require manual and tedious hyper-parameter tuning. In contrast, our method is data-informed, exhibits provable guarantees on the size and performance of the pruned network, and is widely applicable to varying network architectures and data sets. Our analytical bounds bridge the notions of compressibility and importance of network structures, which gives rise to a fully-automated procedure for identifying and preserving the filters in layers that are essential to the network's performance. Our experimental results across varying pruning scenarios show that our algorithm consistently generates sparser and more efficient models than those generated by existing filter pruning approaches.
- Keywords: theory, compression, filter pruning, neural networks