Low Rank Quantization Adaptation for Large Language Model

ICLR 2025 Conference Submission1723 Authors

19 Sept 2024 (modified: 23 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantization, Low-Rank Adaptation, LLM
Abstract: As the parameters of Large Language Models (LLMs) increase, quantization has emerged as a potent strategy for model compression and acceleration. Concurrently, Low-Rank Adaptation (LoRA) has been recognized as an effective method for enhancing LLM performance. However, integrating LoRA with quantization presents significant challenges, particularly in preserving the quantization format after model optimization. In this paper, we introduce Low rank Quantization Adaptation (LoQA) for LLM, a novel approach that effectively fine-tunes holistic quantization parameters. Specifically, we first propose a new perspective of quantization operator, which is compatiable with LoRA and mathematically equivalent to the original operator. In this way, all the parameters (scale and zero point) are finetuned simultaneously, and thus yields notable improvements in model performance.Thanks to the expanded optimization landscape, LoQA is broadly applicabile to various Post-Training Quantization (PTQ) techniques, ensuring better generalizability in practical deployments. To maintain the stability of the optimization, we further propose a LoRA scaling strategy that leverages quantization data to adjust the norm of the low rank adaptation, regulating the speed of convergence in optimization and preventing inappropriate LoRA scaling, which could lead to overfitting or underfitting. Compared to existing methods, LoQA consistently achieves performance gains across a wide range of models, proving its effectiveness and adaptability.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1723
Loading