Abstract: Deep neural network (DNN) has rapidly found many applications in different scenarios.
However, its large computational cost and memory consumption are barriers
to computing restrained applications. DNN model quantization is a widely
used method to reduce the DNN storage and computation burden by decreasing
the bit width. In this paper, we propose a novel cursor based adaptive quantization
method using differentiable architecture search (DAS). The multiple bits’
quantization mechanism is formulated as a DAS process with a continuous cursor
that represents the possible quantization bit. The cursor-based DAS adaptively
searches for the desired quantization bit for each layer. The DAS process can
be solved via an alternative approximate optimization process, which is designed
for mixed quantization scheme of a DNN model. We further devise a new loss
function in the search process to simultaneously optimize accuracy and parameter
size of the model. In the quantization step, based on a new strategy, the closest
two integers to the cursor are adopted as the bits to quantize the DNN together to
reduce the quantization noise and avoid the local convergence problem. Comprehensive
experiments on benchmark datasets show that our cursor based adaptive
quantization approach achieves the new state-of-the-art for multiple bits’ quantization
and can efficiently obtain lower size model with comparable or even better
classification accuracy.
Original Pdf: pdf
8 Replies
Loading