Unified Progressive Quantization toward 2-bit Instruction-Tuned LLMs

ICLR 2026 Conference Submission16608 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, 2-bit quantization, post-training quantization, quantization-aware training, PTQ, QAT
Abstract: As large language models (LLMs) scale, deploying them on edge devices becomes challenging, driving interest in ultra-low-bit quantization, particularly INT2. Through quantization error bound derivation, we identify two key factors for effective 2‑bit quantization of instruction-tuned LLMs: (1) progressive quantization is critical, introducing an intermediate 4‑bit stage—quantizing FP16 to INT4 before reducing to INT2; (2) quantization‑aware training (QAT) should minimize the divergence between INT2 and FP16 output distributions, rather than optimizing with next‑token prediction loss, to retain both general linguistic knowledge and instruction‑following ability. Building on these analyses, we propose Unified Progressive Quantization (UPQ), which combines INT4 PTQ with a distillation‑based INT2 QAT. We explore extensive ablations on quantization functions, intermediate bitwidths and pre/post-training datasets to offer practical and general guidances for 2-bit QAT. UPQ quantizes instruct LLMs to INT2 with open‑source pre‑training data, achieving state‑of‑the‑art MMLU and IFEval results.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16608
Loading