$R^2$: Range Regularization for Model Compression and Quantization

Arnav Kundu; Chungkuk Yoo; Srijan Mishra; Minsik Cho; Saurabh Adya

$R^2$: Range Regularization for Model Compression and Quantization

Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: regularization, quantization, compression, post-training quantization, quantization-aware training

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose weight regularization for model quantization. It removes outliers in weights of a pre-trained model. SOTA PTQ/QAT are improved with our regularization.

Abstract: Model parameter regularization is a widely used technique to improve generalization, but, it can also be used to shape the weight distributions for various purposes. In this work, we propose range regularization ($R^2$) for building quantization and compression friendly models by removing outliers from weights during training. By effectively regulating range of weights, we mold the overall distribution into a tight shape to ensure high quantization bit resolution, therefore allowing model compression and quantization techniques can to utilize their limited numeric representation powers better. We introduce $L_\infty$ regularization, its extension margin regularization and a new soft-min-max regularization to be used as a regularization loss during full-precision model training. We show that this technique generalizes well for post training quantization, quantization aware training methods like EWGS and compression techniques like DKM. Coupled with state-of-the-art quantization and compression techniques, models trained with $R^2$ perform better on an average, specifically at lower bit weights with 16x compression ratio. Our results show that $R^2$ generates state of the art 2-bit quantized models for heavily parameter constrained models like MobileNet V1 and V2 when coupled with EWGS. Additionally, for high compression ratio (32x), models trained with $R^2$ significantly better than the ones trained without it.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5169

Loading