Keywords: Streaming k-PCA, Gradient Subspace Drift, Low-Rank LLM Training
TL;DR: Ori is a low-rank optimizer using streaming k-PCA (Oja++) to track slowly evolving gradient subspaces, offers dynamic-regret bounds, and—in very low-rank regimes—narrows the gap to full-rank training while improving efficiency over existing methods.
Abstract: A prominent family of low-rank LLM optimizers, pioneered by GaLore, performs gradient-subspace projection: at each step, they project the gradient into a low-dimensional subspace and map the update back to the full model. This type of training reduces memory, but its effectiveness depends on how the algorithm adapts to a changing gradient subspace as the model is being optimized. Existing work either recomputes the gradient subspace at certain intervals, or performs post-hoc adaption to catch up to the moving subspace. In this work, we explicitly identify that in practice, the LLM gradient subspace changes slowly and stably -- a phenomenon we call gradient subspace drift. We argue that explicitly leveraging this slow drift is key to ensuring stable learning in extremely low-rank scenarios. To this end, we propose Ori, a novel low-rank optimizer for training LLMs that achieves state-of-the-art accuracy under stringent low-rank constraints. To capture the shifting principal gradient directions over time, Ori employs a streaming $k$-PCA algorithm inspired by Oja++, whose low per-step cost enables high-frequency, on-the-fly subspace tracking. In practice, Ori reduces the performance gap of the state-of-the-art low-rank optimizers to full-rank optimizers by up to 69.3\% on a 1B LLaMA pretraining task, and achieves better performance on 15 out of 16 task scenarios in Roberta-Base fine-tuning. We further propose a novel theoretical analysis establishing dynamic regret bounds for subspace tracking under drift, providing formal guarantees that yield sublinear regret in the noiseless setting. Together, these contributions allow Ori to further push the performance and efficiency frontiers of gradient-projecting optimizers, narrowing the gap to full-rank training for LLMs. Code can be found at [https://anonymous.4open.science/r/Ori-Submission-2132](https://anonymous.4open.science/r/Ori-Submission-2132).
Primary Area: optimization
Submission Number: 21959
Loading