Keywords: Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Approximation
TL;DR: We propose GradNormLoRP, a memory- and parameter-efficient fine-tuning method for LLMs that matches full fine-tuning performance while enabling training on consumer GPUs.
Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks. However, the increasing computational demands—especially from full fine-tuning—pose significant challenges in terms of efficiency and scalability. Parameter-efficient fine-tuning (PEFT) methods have been proposed to mitigate these costs, but they often lag behind full fine-tuning in performance and remain memory-intensive.
In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach that improves both parameter and memory efficiency while achieving performance on par with full fine-tuning. GradNormLoRP normalizes weight matrices to enhance gradient conditioning, which promotes better convergence during optimization. It further employs low-rank approximations for both weight and gradient matrices, significantly reducing memory usage during training.
Extensive experiments show that 8-bit GradNormLoRP reduces optimizer memory consumption by up to 89.5%, enabling the pre-training of large LLMs such as LLaMA 7B on consumer-grade GPUs like the NVIDIA RTX 4090—without incurring additional inference costs. On downstream tasks, GradNormLoRP also outperforms existing low-rank methods. For example, fine-tuning RoBERTa on the full GLUE benchmark with rank 8 yields an average score of 80.65, surpassing LoRA’s 79.23.
These results highlight GradNormLoRP as a promising and practical alternative for efficient LLM pre-training and fine-tuning.
Submission Number: 169
Loading