Abstract: The advancement of deep models poses great challenges to real-world deployment due to limited computational ability and storage space on edge devices. To solve this problem, existing works have made progress in training to quantize deep models, i.e. converting full-precision weights ($r$) into discrete values ($q$) in a supervised training framework. However, this process is non-differentiable, which leads to either infinite or zero gradients ($g_r$) for $r$. To solve this problem, most training-based quantization methods use gradient ($g_q$) of $q$ with clipping to approximate $g_r$ by Straight-Through-Estimator (STE) or manually design their computation. However, these methods only heuristically make training-based quantization applicable, without further analysis on how these approximated gradients can assist training of a quantized network. In this paper, we propose to learn $g_r$ by a neural network. Specifically, a meta network is trained using $g_q$ and $r$ as inputs, its output is assigned to $g_r$ for subsequent weight updates. The meta network is updated together with the original quantized network. This method alleviates the problem of non-differentiability and the whole process can be trained end-to-end. Extensive experiments are conducted on CIFAR10/100 and ImageNet on various networks to demonstrate our advantage of faster convergence rate and better performance.
Code Link: https://github.com/csyhhu/MetaQuant
CMT Num: 2171
0 Replies
Loading