Double Rounding Quantization for Flexible Deep Neural Network Compression

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Model Quantization, Double Rounding, Mixed-Precision Super-Net
Abstract: Model quantization is widely applied for compression and acceleration of deep neural networks, due to its simplification and adaptability. The quantization bit-width is typically predefined for quantizing a given neural network. However, the bit-width settings vary in different hardware and transmission demands, which will induce considerable training and storage costs. Therefore, the scheme of once-joint training for multiple bit-widths (multi-bit) is proposed to address this issue. In this paper, we propose a Double Rounding quantization method that can save the highest bit-width model instead of the full-precision counterpart and fully exploits the representation value range. Nevertheless, the performance during once-joint training degrades significantly due to inconsistent gradients between high-bit and low-bit quantization. To tackle this problem, we set the learning rate of multi-bit to proper values in an adaptive manner during training. We also apply our method for mixed-precision super-net and provide a novel training strategy with weighted probability. Experimental results demonstrate the proposed method outperforms the SOTA once-joint quantization-aware methods on ImageNet datasets. The code will be available soon.
Supplementary Material: pdf
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7387
Loading