Training with Quantization Noise for Extreme Model Compression

Pierre Stock; Angela Fan; Benjamin Graham; Edouard Grave; Rémi Gribonval; Herve Jegou; Armand Joulin

Training with Quantization Noise for Extreme Model Compression

Pierre Stock, Angela Fan, Benjamin Graham, Edouard Grave, Rémi Gribonval, Herve Jegou, Armand Joulin

Published: 12 Jan 2021, Last Modified: 12 Oct 2025ICLR 2021 PosterReaders: Everyone

Keywords: Compression, Efficiency, Product Quantization

Abstract: We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work with extreme compression methods where the approximations introduced by STE are severe. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14 MB and 80.0% top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3 MB.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Code: [![github](/images/github_icon.svg) pytorch/fairseq](https://github.com/pytorch/fairseq) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=dV19Yyi1fS3)

Data: [ImageNet](https://paperswithcode.com/dataset/imagenet), [MultiNLI](https://paperswithcode.com/dataset/multinli), [WikiText-103](https://paperswithcode.com/dataset/wikitext-103), [WikiText-2](https://paperswithcode.com/dataset/wikitext-2)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/training-with-quantization-noise-for-extreme/code)

12 Replies

Loading