Gradient dynamics of low-rank fine-tuning beyond kernels

Arif Kerem Dayi; Sitan Chen

Gradient dynamics of low-rank fine-tuning beyond kernels

Arif Kerem Dayi, Sitan Chen

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: learning theory, fine tuning, online sgd dynamics, neural networks

TL;DR: We analyze the SGD dynamics of learning rank-1 perturbations beyond the NTK setting, and prove linear sample complexity in the dimension for strong recovery.

Abstract: LoRA has emerged as one of the \emph{de facto} methods for fine-tuning foundation models with low computational cost and memory footprint. The idea is to only train a low-rank perturbation to the weights of a pre-trained model, given supervised data for a downstream task. Despite its empirical sucess, from a mathematical perspective it remains poorly understood what learning mechanisms ensure that gradient descent converges to useful low-rank perturbations. In this work we initiate the study of low-rank fine-tuning in a student-teacher setting. We are given the weights of a two-layer \emph{base model} $f$, as well as i.i.d. samples $(x,f^*(x))$ where $x$ is Gaussian and $f^*$ is the \emph{teacher model} given by perturbing the weights of $f$ by a rank-1 matrix. This generalizes the setting of \emph{generalized linear model (GLM) regression} where the weights of $f$ are zero. When the rank-1 perturbation is comparable in norm to the weight matrix of $f$, the training dynamics are nonlinear. Nevertheless, in this regime we prove under mild assumptions that a student model which is initialized at the base model and trained with online gradient descent will converge to the teacher in $dk^{O(1)}$ iterations, where $k$ is the number of neurons in $f$. Importantly, unlike in the GLM setting, the complexity does not depend on fine-grained properties of the activation's Hermite expansion. We also prove that in our setting, learning the teacher model ``from scratch'' can require significantly more iterations.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12993

Loading