Keywords: Pruning, Depth-wise Separable Convolution, GPU
TL;DR: Hardware-aware Pruning for Fast Depth-wise Separable Convolution
Abstract: Depth-wise Separable Convolution (DSConv) has a powerful representation even with fewer parameters and computation, leading to its adoption by almost all of the state-of-the-art CNN models.
DSConv models are already compact making it hard to apply pruning, and there are few previous pruning techniques that target depth-wise convolution (DW-conv).
In this paper, we present Depth-wise Separable Convolution Pruning (DEPrune), a novel pruning method applied to both point-wise and depth-wise convolutions.
DEPrune is optimized by analyzing the computation of DSConv on GPUs.
DEPrune employs a fine-grained pruning approach, yet it achieves the structured sparsity typically absent in fine-grained pruning, enabling practical hardware acceleration.
Moreover, this method maintains a high pruning ratio without causing any accuracy drop.
We additionally represent techniques that further enhance DEPrune performance: 1) balanced workload tuning (BWT), and 2) hardware-aware sparsity recalibration (HSR).
Experiment results show that DEPrune achieves up to $3.74\times$ practical speedup in DSConv inference on GPUs while maintaining the accuracy of EfficientNet-B0 on ImageNet.
Primary Area: Deep learning architectures
Submission Number: 880
Loading