Gradient Weight-normalized Low-rank Projection for Efficient LLM Training

Evangelos Kanoulas; JIA-HONG HUANG; Yixian Shen; Hongyi Zhu; Stevan Rudinac

Gradient Weight-normalized Low-rank Projection for Efficient LLM Training

Evangelos Kanoulas, JIA-HONG HUANG, Yixian Shen, Hongyi Zhu, Stevan Rudinac

Published: 23 Jun 2025, Last Modified: 23 Jun 2025Greeks in AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Approximation

TL;DR: We propose GradNormLoRP, a memory- and parameter-efficient fine-tuning method for LLMs that matches full fine-tuning performance while enabling training on consumer GPUs.

Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks. However, the increasing computational demands—especially from full fine-tuning—pose significant challenges in terms of efficiency and scalability. Parameter-efficient fine-tuning (PEFT) methods have been proposed to mitigate these costs, but they often lag behind full fine-tuning in performance and remain memory-intensive. In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach that improves both parameter and memory efficiency while achieving performance on par with full fine-tuning. GradNormLoRP normalizes weight matrices to enhance gradient conditioning, which promotes better convergence during optimization. It further employs low-rank approximations for both weight and gradient matrices, significantly reducing memory usage during training. Extensive experiments show that 8-bit GradNormLoRP reduces optimizer memory consumption by up to 89.5%, enabling the pre-training of large LLMs such as LLaMA 7B on consumer-grade GPUs like the NVIDIA RTX 4090—without incurring additional inference costs. On downstream tasks, GradNormLoRP also outperforms existing low-rank methods. For example, fine-tuning RoBERTa on the full GLUE benchmark with rank 8 yields an average score of 80.65, surpassing LoRA’s 79.23. These results highlight GradNormLoRP as a promising and practical alternative for efficient LLM pre-training and fine-tuning.

Submission Number: 169

Loading