Fast Learning Rate Transfer for Gradient Descent in Sketched Linear Regression

Published: 29 May 2026, Last Modified: 10 Jun 2026HiLD at ICML 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hyperparameter transfer, learning-rate transfer, sketched linear regression, neural scaling laws
Abstract: We study the efficiency of the hyperparameter (HP) transfer strategy from \citet{yang2022tensor} in a solvable sketched-linear model with varying width (i.e., number of features), where the HP of interest is the gradient descent (GD) learning rate. Following the fast-transfer framework of \citet{ghosh2025mechanisms}, we characterize \textit{fast transfer} (which implies computational gain) by comparing the convergence rates of the loss and the optimal HP with respect to the scaling dimension (model width $n$). For a fixed optimization horizon $T$, we prove a central limit theorem (CLT) for the optimal learning rate and the optimally tuned loss, yielding unconditional fast transfer. On the other hand, for growing horizons (i.e., when $T$ jointly diverges with $n$), the optimal learning rate approaches the stability edge beyond which GD diverges, and under an explicit scale separation between $T$ and $n$, we establish fast transfer with rates depending on the spectral source and capacity conditions.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 176
Loading