1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

Chang Gao; JingRen Hou; Kang Zhao; Jiaqi Wang; Jianfei Chen; Liping Jing

1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit

Chang Gao, JingRen Hou, Kang Zhao, Jiaqi Wang, Jianfei Chen, Liping Jing

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: efficient machine learning, quantization methods, efficient training algorithms, fully quantized training

Abstract: Fully quantized training (FQT) accelerates the training of deep neural networks by quantizing the activations, weights, and gradients into lower precision. To explore the ultimate limit of FQT (the lowest achievable precision), we make a first attempt to 1-bit FQT. We provide a theoretical analysis of FQT based on Adam and SGD, revealing that the gradient variance influences the convergence of FQT. Building on these theoretical results, we introduce an Average 1-bit Quantization (AQ) strategy. The strategy leverages the heterogeneity of gradients to mitigate gradient variance by pruning less informative gradients and enhancing the numerical precision of remaining gradients. Additionally, we propose Sample Channel joint Quantization (SCQ), which utilizes different quantization strategies in the computation of weight gradients and activation gradients to ensure that the method is friendly to low-bitwidth hardware. Finally, we present a framework to deploy our algorithm. For fine-tuning VGGNet-16 and ResNet-18 on multiple datasets, our algorithm achieves an average accuracy improvement of approximately 6\%, compared to per-sample quantization. Moreover, our training speedup can reach a maximum of 5.13× compared to full precision training.

Supplementary Material: zip

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13456

Loading