Ori: Capturing Gradient Subspace Drift via Streaming $k$-PCA for Very Low-Rank LLM Training

Xinyu Luo; Jincheng Zhou; Mengbo Wang; Zilin Shen; Nadia Atallah Lanman; Petros Drineas; Bruno Ribeiro; Brian Bullins

Ori: Capturing Gradient Subspace Drift via Streaming $k$-PCA for Very Low-Rank LLM Training

Xinyu Luo, Jincheng Zhou, Mengbo Wang, Zilin Shen, Nadia Atallah Lanman, Petros Drineas, Bruno Ribeiro, Brian Bullins

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Streaming k-PCA, Gradient Subspace Drift, Low-Rank LLM Training

TL;DR: Ori is a low-rank optimizer using streaming k-PCA (Oja++) to track slowly evolving gradient subspaces, offers dynamic-regret bounds, and—in very low-rank regimes—narrows the gap to full-rank training while improving efficiency over existing methods.

Abstract: A prominent family of low-rank LLM optimizers, pioneered by GaLore, performs gradient-subspace projection: at each step, they project the gradient into a low-dimensional subspace and map the update back to the full model. This type of training reduces memory, but its effectiveness depends on how the algorithm adapts to a changing gradient subspace as the model is being optimized. Existing work either recomputes the gradient subspace at certain intervals, or performs post-hoc adaption to catch up to the moving subspace. In this work, we explicitly identify that in practice, the LLM gradient subspace changes slowly and stably -- a phenomenon we call gradient subspace drift. We argue that explicitly leveraging this slow drift is key to ensuring stable learning in extremely low-rank scenarios. To this end, we propose Ori, a novel low-rank optimizer for training LLMs that achieves state-of-the-art accuracy under stringent low-rank constraints. To capture the shifting principal gradient directions over time, Ori employs a streaming $k$-PCA algorithm inspired by Oja++, whose low per-step cost enables high-frequency, on-the-fly subspace tracking. In practice, Ori reduces the performance gap of the state-of-the-art low-rank optimizers to full-rank optimizers by up to 69.3\% on a 1B LLaMA pretraining task, and achieves better performance on 15 out of 16 task scenarios in Roberta-Base fine-tuning. We further propose a novel theoretical analysis establishing dynamic regret bounds for subspace tracking under drift, providing formal guarantees that yield sublinear regret in the noiseless setting. Together, these contributions allow Ori to further push the performance and efficiency frontiers of gradient-projecting optimizers, narrowing the gap to full-rank training for LLMs. Code can be found at [https://anonymous.4open.science/r/Ori-Submission-2132](https://anonymous.4open.science/r/Ori-Submission-2132).

Primary Area: optimization

Submission Number: 21959

Loading