CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding

Ziteng Sun; Adrian Benton; Samuel Kushnir; Asher Trockman; Vikas Singh; Suhas Diggavi; Ananda Theertha Suresh

CafeQ: Calibration-free Quantization via Learned Transformations and Adaptive Rounding

Ziteng Sun, Adrian Benton, Samuel Kushnir, Asher Trockman, Vikas Singh, Suhas Diggavi, Ananda Theertha Suresh

18 Sept 2025 (modified: 01 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language model; post-training quantization

Abstract: Post-training quantization is an effective method for reducing the serving cost of large language models, and the standard approach is to use a round-to-nearest quantization level scheme. But this often suffers from large errors due to certain outliers in the weights. Proposed mitigation mechanisms include applying adaptive rounding, random rotation transformations or committing to a post-training target using calibration data. Unfortunately, this reliance on calibration data can be severely limiting in many real-world scenarios as such data may be unavailable or subject to privacy regulations. In this paper, we propose algorithms to optimize transformations and adaptive rounding without access to any calibration data. The optimization is achieved by designing a suitable proxy function for the quantization loss without calibration data. To maintain inference efficiency, we perform structured matrix transformations for single matrices. For paired weights that interact directly in the computation graph, we use dual matrix transformations and adaptive rounding methods. We conduct experiments on Gemma 2 models, and observe consistent improvement over the baselines. For Gemma 2 9B quantization, our method improves the average benchmark score from 61.9 to 62.4 for 4-bit quantization and from 52.0 to 60.6 for 3-bit quantization, while adding less than 3\% of computation overhead. Furthermore, our method achieves performance comparable to the commonly used GPTQ method, which needs calibration data.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 14118

Loading