Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study

Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study

08 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LoRA, GRPO, rank allocation, parameter-efficient fine-tuning, reinforcement learning, gradient profiling, low-rank adaptation

TL;DR: Gradient-based adaptive LoRA rank allocation, which improves SFT, degrades performance under GRPO due to a flat gradient landscape and a novel gradient amplification effect.

Abstract: Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, specifically Group Relative Policy Optimization (GRPO). Using gradient-magnitude profiling on Qwen 2.5 1.5B with GSM8K, we find that it does not: proportional rank allocation degrades accuracy by 4.5 points compared to uniform allocation (70.0% vs. 74.5%), despite using identical parameter budgets. We identify two mechanisms behind this failure. First, the gradient landscape under GRPO is fundamentally flatter than under SFT, the max-to-min layer importance ratio is only 2.17 times, compared to more than 10 times reported in SFT literature. All layers carry meaningful gradient signal; none are truly idle. Second, we discover a gradient amplification effect: non-uniform allocation widens the importance spread from 2.17 times to 3.00 times, creating a positive feedback loop where high-rank layers absorb more gradient while low-rank layers are progressively silenced. Our results suggest that gradient importance does not predict capacity requirements under RL, and that naive transfer of SFT-era rank allocation to alignment training should be avoided.

Submission Number: 105

Loading