Hardware-Friendly Post-Training Quantization: Input- and Output-Channelwise Scale and Offset

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: post-training quantization, CNN, computer vision, low-bit, discrete optimization, Neural network quantization, calculation cost
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Input/output channel-wise offset and scaling for PTQ
Abstract: Post-training quantization enables swift quantization of neural networks using a minimal calibration dataset. Specifically, these methods tend to underperform dramatically on hardware with fixed integer bit width, particularly in extremely low-bit quantization scenarios. In response, we introduce an optimized method for uniform channel-wise quantization, which is compatible with existing hardware. This approach does not increase memory requirements and results in only a marginal increase in computation. This strategy involves applying a specific multiplier to the result of the weighted activation products, thereby yielding a more accurate result for the multiply-accumulate (MAC) operation in convolutional or fully-connected layers. We also present an optimization technique to determine the optimal channel grouping approach. To affirm the superiority of our proposed quantization scheme, we conducted tests on a variety of CNN-based models. Our proposed approach enhances accuracy in 2/4-bit weight and feature quantization by 1-5%p while only increasing the number of integer operations in convolutional-based networks by less than 1.5%.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4312
Loading