Optimize Quantization for Large Language Models via Progressive Training

Jiangcun Du, Renren Jin, Wuwei Huang, Wei Liu, Jian Luan, Deyi Xiong

Published: 01 Jan 2025, Last Modified: 15 Oct 2025WWW (Companion Volume) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Quantization has played a significant role in enabling large language models to operate efficiently. Quantization-Aware Training (QAT) compensates for the loss incurred during the quantization process through training, and has demonstrated promising results. However, in the case of extremely low-bit quantization, such as 3 bits, the performance of QAT degrades significantly. In this paper, we delve into the challenges associated with data selection and training approach. Specifically, we have initially analyzed which type of data yields better results when applying ultra low-bit QAT to base or chat models. Building on this analysis, we further propose an iterative training approach that enhances the stability of model quantization at extremely low-bit configuration. Experimental evaluations demonstrate the effectiveness of the proposed method.

External IDs:dblp:conf/www/DuJHL0X25