Keywords: Efficient Fine-Tuning, NLP, Iterative Optimization, Layer-wise Quantization and Low-Rank Configuration
TL;DR: Method of obtaining a high-performance low-precision model.
Abstract: To address the memory consumption and computational efficiency issues in fine-tuning large language models (LLMs), Parameter-Efficient Fine-Tuning (PEFT) and quantization have emerged. Recent studies have combined the two and have proposed adjusting parameters before fine-tuning to reduce quantization errors, aiming to improve fine-tuning performance. We find that the performance of fine-tuning on the adjusted quantized models is even worse than using the original quantized models directly, as the adjusted model is essentially a completely different model from the original quantized model. Additionally, we have discovered that due to the poor robustness of quantized models, increasing the training difficulty may result in even worse outcomes. To address this, we propose two constraints for fine-tuning quantized models, and based on these, we introduce a general fine-tuning framework called QR-Adaptor. This framework bypasses the network errors introduced by quantization and directly uses actual performance and memory as optimization targets. Through initialization, extrapolation, and interpolation, it quickly solves this gradient-free optimization problem. Experimental results demonstrate that our method yields fine-tuned low-bit quantized models that outperform fine-tuned 16-bit models while maintaining the same memory usage as fine-tuning 4-bit models. For example, in the zero-shot test on MMLU, it improves accuracy by 3.3\% over both LoftQ and LQ-LoRA.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 726
Loading