Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks

Jian Cheng, Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu

2018 (modified: 13 Nov 2024)IEEE Trans. Neural Networks Learn. Syst. 2018Readers: Everyone

Abstract: We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. Guided by minimizing the approximation error of individual layer's response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance of our approach: with comparable classification accuracy, it achieves 4 to 6× acceleration and 15 to 20× compression. With our method, accurate image classification can even be directly carried out on mobile devices within 1 s.

0 Replies