Track: tiny paper (up to 4 pages)
Keywords: Randomization, Efficiency, Geometry, Low-Rank
Abstract: Low-rank gradient optimization for large language models is currently divided into two categories: structured methods that rigorously identify subspaces, and randomized approaches employed primarily for computational efficiency. We question the intuition behind why random projections are effective, tracing this phenomenon to the geometry of the gradient space. Finding that subspace optimization landscape is nearly flat, while a significant portion of gradient information lies outside the core subspace, we introduce GrassWalk and GrassJump, algorithms that navigate the Grassmannian manifold via random walks and jumps. By coupling this randomized exploration with subspace-aware optimizer and recovering the lost gradient signals, we achieve state-of-the-art results. Our findings reframe randomization not merely as a computational shortcut, but as a geometrically principled approach to high-dimensional optimizations.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Sahar_Rajabi1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Serve As Reviewer: ~Sahar_Rajabi1
Submission Number: 114
Loading