Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence

Valérie Castin; Kimia Nadjahi; Pierre Ablin; Gabriel Peyré

Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence

Valérie Castin, Kimia Nadjahi, Pierre Ablin, Gabriel Peyré

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: BaLoRA accelerates convergence and improves robustness to hyperparameter tuning by projecting LoRA iterates onto a balanced manifold.

Abstract: Low-Rank Adaptation (LoRA) is the most widely adopted method for fine-tuning large language models. Notably, LoRA is inherently overparameterized: multiple pairs of low-rank factors can yield the same adapted weight matrix. We show—both theoretically and empirically—that these pairs exhibit significantly different condition numbers. As a result, converging to different loss minimizers directly impacts the convergence rate of LoRA. Building on this observation, we introduce Balanced Low-Rank Adaptation (BaLoRA), a variant of LoRA that projects iterates onto a balanced manifold. This manifold improves the conditioning of the loss landscape while preserving the adapted matrix. The projection step is computationally lightweight and integrates seamlessly into existing fine-tuning pipelines. Empirically, BaLoRA converges faster than standard LoRA and achieves superior performance across a range of fine-tuning tasks.

Lay Summary: Large language models (LLMs) are typically adapted to specific tasks through a technique called Low-Rank Adaptation (LoRA), which fine-tunes only a small number of additional parameters rather than the full model. While efficient, LoRA has a subtle redundancy: many different sets of parameters can produce the exact same model update. We show that this redundancy has real consequences — some of these equivalent parameter sets make the optimization landscape much harder to navigate, slowing down training. To fix this, we introduce BaLoRA, a simple extension of LoRA that steers the parameters toward a "well-behaved" region during training. This correction is computationally cheap and easy to plug into existing workflows. In practice, BaLoRA trains faster and achieves better performance than standard LoRA across a variety of tasks.

Link To Code: https://github.com/vcastin/balora

Primary Area: Optimization->Non-Convex

Keywords: Fine-tuning, LoRA, Transformers, balanced manifold, parameter invariance

Originally Submitted PDF: pdf

Submission Number: 11611

Loading