Low Rank Quantization Adaptation for Large Language Model

Xiwei Xu; Yuexiao Ma; Wenting Lin; Yuhang Wu; Zelan Yang; Wanchen Sui; Shen Li; Yong Li; Xiawu Zheng; Fei Chao; Rongrong Ji

Low Rank Quantization Adaptation for Large Language Model

Xiwei Xu, Yuexiao Ma, Wenting Lin, Yuhang Wu, Zelan Yang, Wanchen Sui, Shen Li, Yong Li, Xiawu Zheng, Fei Chao, Rongrong Ji

19 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Quantization, Low-Rank Adaptation, LLM

Abstract: As the parameters of Large Language Models (LLMs) increase, quantization has emerged as a potent strategy for model compression and acceleration. Concurrently, Low-Rank Adaptation (LoRA) has been recognized as an effective method for enhancing LLM performance. However, integrating LoRA with quantization presents significant challenges, particularly in preserving the quantization format after model optimization. In this paper, we introduce Low rank Quantization Adaptation (LoQA) for LLM, a novel approach that effectively fine-tunes holistic quantization parameters. Specifically, we first propose a new perspective of quantization operator, which is compatiable with LoRA and mathematically equivalent to the original operator. In this way, all the parameters (scale and zero point) are finetuned simultaneously, and thus yields notable improvements in model performance.Thanks to the expanded optimization landscape, LoQA is broadly applicabile to various Post-Training Quantization (PTQ) techniques, ensuring better generalizability in practical deployments. To maintain the stability of the optimization, we further propose a LoRA scaling strategy that leverages quantization data to adjust the norm of the low rank adaptation, regulating the speed of convergence in optimization and preventing inappropriate LoRA scaling, which could lead to overfitting or underfitting. Compared to existing methods, LoQA consistently achieves performance gains across a wide range of models, proving its effectiveness and adaptability.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1723

Loading