Automated Model Compression by Jointly Applied Pruning and Quantization

Wenting Tang, Xingxing Wei, Bo Li

16 May 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Although deep neural networks (DNNs) achieve excellent performance in real-world computer vision tasks, network compression is always necessary to adapt DNNs into edge devices such as mobile phones. In the traditional \emph{deep compression} framework, iteratively performing network pruning and quantization reduces the model size and computation cost successfully. However, such a step-wise application of pruning and quantization may lead to suboptimal solutions and unnecessary time consumption. In this paper, we tackle this issue by integrating network pruning and quantization as a unified \emph{joint compression} problem, and then use AutoML to automatically solve it. We find the pruning process can be regarded as the channel-wise quantization with 0 bit. Thus, the separate two-step pruning and quantization can be simplified as the one-step quantization with mixed precision. This unification not only simplifies the compression pipeline but also avoids the compression divergence. To implement this idea, we propose the \textbf{A}utomated model compression by \textbf{J}ointly applied \textbf{P}runing and \textbf{Q}uantization (AJPQ). AJPQ is designed with a hierarchical architecture: the layer controller controls the layer sparsity and the channel controller decides the bitwidth for each kernel. Following the same importance criterion, the layer controller and the channel controller collaboratively decide the compression strategy. With the help of reinforcement learning, our one-step compression is automatically achieved. Compared with the state-of-the-art automated compression methods, our method obtains a better accuracy while reducing the considerable storage. For fixed precision quantization, AJPQ can reduce more than $\times5$ model size and $\times2$ computation with slightly performance increase for SensenNet in remote sensing object detection; When mixed precision is allowed, AJPQ can reduce $\times5$ model size with only ${1.06\%}$ top-5 accuracy decline for MobileNet in classification task.

0 Replies