Fixed Point Quantization of Deep Convolutional Networks

Darryl D. Lin; Sachin S. Talathi; V. Sreekanth Annapureddy

Fixed Point Quantization of Deep Convolutional Networks

Darryl D. Lin, Sachin S. Talathi, V. Sreekanth Annapureddy

20 Jan 2025 (modified: 18 Feb 2016)ICLR 2016Readers: Everyone

Abstract: In recent years increasingly complex architectures for deep convolution networks (DCNs) have been proposed to boost the performance on image recognition tasks. However, the gains in performance have come at a cost of substantial increase in computation and model storage resources. Fixed point implementation of DCNs has the potential to alleviate some of these complexities and facilitate potential deployment on embedded hardware. In this paper, we formulate and solve an optimization problem to identify the optimal fixed point bit-width allocation across layers to enable efficient fixed point implementation of DCNs. Our experiments show that in comparison to equal bit-width settings, optimized bit-width allocation offers >20% reduction in model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78% error-rate on CIFAR-10 benchmark.

Conflicts: qualcomm.com

3 Replies

Loading