POT-PTQ: A Two-step Power-of-Two Post-training for LLMs

Xinyu Wang, Vahid Partovi Nia, Peng Lu, Jerry Huang, Xiao-Wen Chang, Boxing Chen, Yufei Cui

Published: 25 Oct 2025, Last Modified: 28 Jan 20262025 The European Conference on Artificial IntelligenceEveryoneRevisionsCC BY 4.0

Abstract: Large Language Models (LLMs) have demonstrated re- markable performance across various natural language processing (NLP) tasks. However, their deployment is challenging due to the substantial computational resources required. Power-of-two (PoT) quantization is a general tool to counteract this difficulty. Albeit pre- vious works on PoT quantization can be efficiently dequantized on CPUs using fixed-point addition, it showed less effectiveness on GPUs. The reason is entanglement of the sign bit and sequential bit manipula- tions needed for dequantization. We propose a novel POT quantization framework for LLM weights that (i) outperforms state-of-the-art ac- curacy in extremely low-precision number formats, and (ii) enables faster inference through more efficient dequantization. To maintain the accuracy of the quantized model, we introduce a two-step post- training algorithm: (i) initialize the quantization scales with a robust starting point, and (ii) refine these scales using a minimal calibration set. The performance of our PoT post-training algorithm surpasses the current state-of-the-art in integer quantization, particularly at low precisions such as 2- and 3-bit formats. Our PoT quantization acceler- ates the dequantization step required for the floating point inference and leads to 3.67×speed up on a NVIDIA V100, and 1.63×on a NVIDIA RTX 4090, compared to uniform integer dequantization.