Abstract: Model quantization is considered as a promising method
to greatly reduce the resource requirements of deep neural
networks. To deal with the performance drop induced by
quantization errors, a popular method is to use training data
to fine-tune quantized networks. In real-world environments,
however, such a method is frequently infeasible because
training data is unavailable due to security, privacy, or confidentiality concerns. Zero-shot quantization addresses such
problems, usually by taking information from the weights of
a full-precision teacher network to compensate the performance drop of the quantized networks. In this paper, we first
analyze the loss surface of state-of-the-art zero-shot quantization techniques and provide several findings. In contrast
to usual knowledge distillation problems, zero-shot quantization often suffers from 1) the difficulty of optimizing multiple
loss terms together, and 2) the poor generalization capability
due to the use of synthetic samples. Furthermore, we observe
that many weights fail to cross the rounding threshold during
training the quantized networks even when it is necessary
to do so for better performance. Based on the observations,
we propose AIT, a simple yet powerful technique for zeroshot quantization, which addresses the aforementioned two
problems in the following way: AIT i) uses a KL distance
loss only without a cross-entropy loss, and ii) manipulates
gradients to guarantee that a certain portion of weights are
properly updated after crossing the rounding thresholds. Experiments show that AIT outperforms the performance of
many existing methods by a great margin, taking over the
overall state-of-the-art position in the field
0 Replies
Loading