Compensate, Don't Reconstruct: Parameter- and Data-Efficient 2-bit LLM Quantization

ICLR 2026 Conference Submission17167 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Quantization, PEFT
Abstract: The substantial memory footprint of large language models (LLMs) remains a key barrier to their on-device deployment. 2-bit quantization is a promising solution; however, current methods impose a difficult trade-off between the high accuracy of training-intensive Quantization-Aware Training (QAT) and the efficiency of lower-performing Quantization Error Compensation (QEC). Our analysis of QEC reveals a critical insight: its effectiveness is more dependent on minimizing activation discrepancy than weight discrepancy alone. Building on this, we introduce LG-QEC, a framework that significantly enhances the compensation process. LG-QEC combines a hybrid adapter and a local-global optimization strategy to directly align activations and suppress quantization errors. Experiments show LG-QEC achieves accuracy comparable to state-of-the-art QAT methods while using only a fraction of the training token budget and trainable parameters. This work successfully bridges the gap between efficiency and performance, enabling accurate and practical 2-bit LLMs.
Primary Area: optimization
Submission Number: 17167
Loading