Bit-by-Bit: Progressive QAT with Outlier Channel Splitting for Stable Low-Bit LLMs

Hao Gu; Binxing Xu; Lujun Li; Hao Wang; Bei Liu; Jiacheng Liu; Qiyuan Zhu; Chao Li; Sirui Han; Yike Guo

Bit-by-Bit: Progressive QAT with Outlier Channel Splitting for Stable Low-Bit LLMs

Hao Gu, Binxing Xu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Chao Li, Sirui Han, Yike Guo

03 Sept 2025 (modified: 03 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: language model, model compression, computational efficiency, QAT, quantization aware training

TL;DR: progressive QAT for low bit LLM

Abstract: Training large language models (LLMs) at ultra–low precision remains challenging: direct low-bit quantization-aware training (QAT) often suffers from slow convergence that demands substantial training budgets, as well as quantization errors arising from heavy-tailed outlier channels and the accumulation of errors across layers. To address these issues, we present \textsc{Bit-by-Bit}, a progressive QAT framework with outlier channel splitting. Our approach integrates three key components: (1) block-wise progressive training that reduces precision stage by stage, ensuring stable initialization for low-bit optimization; (2) rounding-aware outlier channel splitting, which mitigates quantization error while acting as an identity transform that preserves the quantized outputs; and (3) microscaling groups with E4M3 scales to capture dynamic activation ranges aligned with OCP/NVIDIA practices. Furthermore, we exploit the nested structure of integer quantization grids to enable a single-run, once-for-any-precision model that can be directly deployed at multiple bit-widths without retraining. We conduct comprehensive evaluations under both weight-only and weight–activation quantization settings. Under W2A2 quantization, Bit-by-Bit narrows the perplexity gap with full-precision models on WikiText2 to just 2.25, consistently outperforming BitDistiller by 24.19 and EfficientQAT by 20.59 on Llama2-7b. Moreover, on the Llama3 family—known for its quantization difficulty, Bit-by-Bit surpasses other QAT baselines. Code is available in the Appendix.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 1749

Loading