Riemannian Manifold Embeddings for Straight-Through EstimatorDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Neural network quantization, Riemannian manifold, Information geometry, Mirror descent
Abstract: Quantized Neural Networks (QNNs) aim at replacing full-precision weights $\boldsymbol{W}$ with quantized weights $\boldsymbol{\hat{W}}$, which make it possible to deploy large models to mobile and miniaturized devices easily. However, either infinite or zero gradients caused by non-differentiable quantization significantly affect the training of quantized models. In order to address this problem, most training-based quantization methods use Straight-Through Estimator (STE) to approximate gradients $\nabla_{\boldsymbol{W}}$ w.r.t. $\boldsymbol{W}$ with gradients $\nabla_{\boldsymbol{\hat{W}}}$ w.r.t. $\boldsymbol{\hat{W}}$ where the premise is that $\boldsymbol{W}$ must be clipped to $[-1,+1]$. However, the simple application of STE brings with the gradient mismatch problem, which affects the stability of the training process. In this paper, we propose to revise an approximated gradient for penetrating the quantization function with manifold learning. Specifically, by viewing the parameter space as a metric tensor in the Riemannian manifold, we introduce the Manifold Quantization (ManiQuant) via revised STE to alleviate the gradient mismatch problem. The ablation studies and experimental results demonstrate that our proposed method has a better and more stable performance with various deep neural networks on CIFAR10/100 and ImageNet datasets.
One-sentence Summary: Quantize neural networks in Riemannian manifolds to alleviate the gradient mismatch problem
6 Replies

Loading