Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional NetworksDownload PDFOpen Website

2018 (modified: 13 Nov 2024)IEEE Trans. Neural Networks Learn. Syst. 2018Readers: Everyone
Abstract: We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. Guided by minimizing the approximation error of individual layer's response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance of our approach: with comparable classification accuracy, it achieves 4 to 6× acceleration and 15 to 20× compression. With our method, accurate image classification can even be directly carried out on mobile devices within 1 s.
0 Replies

Loading