An Empirical Study of Extremely Low-Bit Quantization for Large Language Models

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, LLM Quantization, LLM compression, Quantization-aware training
Abstract: Recent research on LLM quantization has predominantly focused on post-training quantization (PTQ). While effective at higher bit-widths, PTQ still suffers from severe performance degradation in extremely low-bit settings (eg 2-bit), limiting its applicability in resource-constrained environments. In contrast, quantization-aware training (QAT) offers a promising solution to recover the accuracy loss introduced by quantization. However, due to its substantial demands on training data and computational resources, QAT remains largely underexplored for LLMs. In this work, we present a comprehensive empirical study of QAT for extremely low-bit quantized LLMs. We investigate critical factors affecting QAT effectiveness, including quantizer design, quantization granularity, initialization strategies, training data selection, and training hyperparameters. Based on these insights, we propose a general QAT recipe and validate it on LLaMA3 models, achieving state-of-the-art performance under extremely low-bit settings. All code and training details will be released to facilitate reproducibility and foster future research on QAT for LLMs.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1850
Loading