VQN: Variable Quantization Noise for Neural Network CompressionDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Quantization refers to a set of methods that compress a neural network by representing its parameters with fewer bits. However, applying quantization to a neural network after training often leads to severe performance regressions. Quantization Aware Training (QAT) addresses this problem by applying simulated training-time quantization for the model to learn robustness to inference-time quantization. One key drawback of this approach is that quantization functions induce biased gradient flow through the network during backpropagation, thus preventing the network from best-fitting to the learning task. Fan et al. addressed this issue by proposing Quant-Noise, in which simulated quantization is applied to a fixed proportion, called the quantization noise rate, of parameters during training. Our study, Variable Quantization Noise (VQN), builds upon their technique by exploring a variable quantization noise rate instead of a fixed one. We craft three candidate functions to vary noise rate during training and evaluate the variants with 3 datasets and 3 quantization schemes for each dataset. First, we report negative results on our hand-crafted candidate functions. Second, we observe somewhat positive results on a method, originally intended as an ablation study, of randomly varying the noise rate during training. This method outperforms Quant-Noise on two out of three quantization schemes for all three tested datasets. Moreover, on two of the datasets, this method at 4x compression matches or exceeds performance of even the uncompressed model. Future work should determine whether these unexpected results hold for more datasets and quantization schemes, as well as investigating other schemes for varying the noise rate during training.
0 Replies

Loading