QGen: On the Ability to Generalize in Quantization Aware Training

TMLR Paper2553 Authors

19 Apr 2024 (modified: 12 Jul 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance. In particular, first, we develop a theoretical model for quantization in neural networks and demonstrate how quantization functions as a form of regularization. Second, motivated by recent work connecting the sharpness of the loss landscape and generalization, we derive an approximate bound for the generalization of quantized models conditioned on the amount of quantization noise. We then validate our hypothesis by experimenting with over 2000 models trained on CIFAR-10, CIFAR-100, and ImageNet datasets on convolutional and transformer-based models.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. We have adjusted our claims as requested by kpuY to “Our theoretical analysis on simplified models suggests that quantization can be seen as a regularizer”. 2. Emphasizing that our theoretical analysis is an approximation. 3. Adding appendix to the main paper that contains how flatness measures where computed. 4. Adding source code to our experiments. 5. Clarifying the caption under Figure 2.
Assigned Action Editor: ~Naigang_Wang1
Submission Number: 2553
Loading