Pruning Compact ConvNets For Efficient Inference

Sayan Ghosh; Karthik Prasad; Xiaoliang Dai; Peizhao Zhang; Bichen Wu; Graham Cormode; Peter Vajda

Pruning Compact ConvNets For Efficient Inference

Sayan Ghosh, Karthik Prasad, Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Graham Cormode, Peter Vajda

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: pruning, neural networks, computations, latency, imagenet

Abstract: Neural network pruning is frequently used to compress over-parameterized networks by large amounts, while incurring only marginal drops in generalization performance. However, the impact of pruning on networks that have been highly optimized for efficient inference has not received the same level of attention. In this paper, we analyze the effect of pruning for computer vision, and study state-of-the-art FBNetV3 family of models. We show that model pruning approaches can be used to further optimize networks trained through NAS (Neural Architecture Search). The resulting family of pruned models can consistently obtain better performance than existing FBNetV3 models at the same level of computation, and thus provide state-of-the-art results when trading off between computational complexity and generalization performance on the ImageNet benchmark. In addition to better generalization performance, we also demonstrate that when limited computation resources are available, pruning FBNetV3 models incur only a fraction of GPU-hours involved in running a full-scale NAS (Neural Architecture Search).

One-sentence Summary: We can compress ConvNets such as FBNetV3 networks optimized by NAS further through pruning to improve on state-of-the-art.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/pruning-compact-convnets-for-efficient/code)

11 Replies

Loading