Abstract: Quantization has played a significant role in enabling large language models to operate efficiently. Quantization-Aware Training (QAT) compensates for the loss incurred during the quantization process through training, and has demonstrated promising results. However, in the case of extremely low-bit quantization, such as 3 bits, the performance of QAT degrades significantly. In this paper, we delve into the challenges associated with data selection and training approach. Specifically, we have initially analyzed which type of data yields better results when applying ultra low-bit QAT to base or chat models. Building on this analysis, we further propose an iterative training approach that enhances the stability of model quantization at extremely low-bit configuration. Experimental evaluations demonstrate the effectiveness of the proposed method.
External IDs:dblp:conf/www/DuJHL0X25
Loading