Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization

Qronos: Correcting the Past by Shaping the Future... in Post-Training Quantization

ICLR 2026 Conference Submission21023 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Post Training Quantization, Large Language Models, Foundation Models, Efficient Machine Learning

Abstract: We introduce Qronos---a new post-training quantization algorithm that not only explicitly corrects errors due to both weight and activation quantization, but also corrects errors accumulated from previously quantized layers. Our iterative algorithm is based on an interpretable and disciplined optimization framework that surpasses existing data-driven approaches. At each step, Qronos alternates between error correction and diffusion via optimal update rules. Importantly, we prove that Qronos admits an equivalent formulation that significantly improves algorithmic efficiency; we use our discovery to reduce peak memory usage by 18\times on Llama3 8B, and our scaling analysis shows a speedup of up to 13.8\times for a single-layer microbenchmark. We demonstrate compatibility with existing transformation techniques such as Hadamard-based incoherence processing and weight-activation scaling equalization, among others. We evaluate Qronos using recent language models in the Llama3 and Qwen3 families; Qronos consistently outperforms previous state-of-the-art adaptive rounding methods when quantizing the weights, activations, and/or KV caches to 4 bits or fewer.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 21023

Loading