Keywords: nonconvex optimization, initialization, quadratic rate, low rank adapter, lora
TL;DR: We prove that a specific initialization can significantly improve the complexity bounds of ScaledGD for matrix factorization under a wide spectrum of settings, including quadratic convergence in cases where only linear rates were previously known.
Abstract: This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known.
Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models.
Our approach, NoRA, i.e., LoRA with Nystrom initialization, demonstrates superior performance in various downstream tasks in large language and diffusion models.
Submission Number: 17
Loading