Keywords: low-rank adaptation, architecture-optimizer co-design, large language models, lora, low-rank adapter, fine-tuning
TL;DR: We propose PoLAR, a polar-decomposition-based parameterization, for efficient fine-tuning of LLMs. PoLAR mitigates the low stable rank seen in LoRA, provably accelerates convergence on a canonical LoRA problem, and lifts accuracy on real-world tasks.
Abstract: We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on a commonsense reasoning benchmark with Llama-2-7B.
Submission Number: 61
Loading