AdaNF: Quantization Group Adaptive NormalFloat for Low Bit Fine-tuning of LLMs

Published: 21 Jun 2024, Last Modified: 24 Jul 2024ES-FoMo-II 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Quantization, LoRA, Low bit fine-tuning, NormalFloat
Abstract: The integration of Quantization and Low-Rank Adaptation (LoRA) presents a promising avenue for the memory-efficient fine-tuning of large language models (LLMs) within GPU memory constraints. QLoRA, introduced by \cite{dettmers2024qlora}, successfully demonstrates high-fidelity 4-bit fine-tuning using an information-theoretically optimal datatype, NormalFloat. However, challenges arise with lower-bit fine-tuning, such as 2-bit, where QLoRA often struggles with convergence due to significant information loss from quantization. In this study, we address these challenges by adjusting the cumulative distribution function (CDF) offset of NormalFloat, which significantly reduces information loss through improved NormalFloat initialization. Furthermore, we introduce quantization group \textbf{Ada}ptive \textbf{N}ormal\textbf{F}loat (AdaNF), a technique that dynamically adjusts the NormalFloat CDF offset based on the statistical characteristics of each quantization group in the parameters. This adaptive approach minimizes the Lp norm of the quantization error through a grid search, allowing for customized quantization that preserves more information. Our empirical investigations across various models and downstream tasks in the low-bit fine-tuning regime confirm that our method achieves performance comparable to existing methods, effectively mitigating the limitations of prior approaches.
Supplementary Material: zip
Submission Number: 28
Loading