NeUQI: Near-Optimal Uniform Quantization Parameter Initialization

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: post-training quantization
TL;DR: We identify the limitations of the Min-Max way of thinking, move beyond its constraints, and propose NeUQI, a method for approximately optimal initialization in uniform quantization.
Abstract: Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on $\geq 2$-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max strategic rationale. In this work, we identify the limitations of the Min-Max strategic rationale, move beyond its constraints, and propose **NeUQI**, a method that efficiently determines near-optimal initialization for uniform quantization. Our NeUQI proposes a method that determines the near-optimal zero-point for a given scale, thereby reformulating the initialization optimization into a scale-only problem that can be solved efficiently. Benefiting from the improved quantization parameters, our NeUQI consistently outperforms existing methods in the experiments with the LLaMA and Qwen families on various settings and tasks. Furthermore, when combined with a lightweight distillation strategy, NeUQI even achieves superior performance to PV-tuning, a considerably more resource-intensive method.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 8379
Loading