From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

Published: 09 Jun 2025, Last Modified: 09 Jun 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Feature learning, stochastic gradient descent, single-index model
Abstract:

To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recently proposed variants of SGD that introduce non-correlational updates are instead limited by the generative exponent. However, this picture is only valid for sufficiently large learning rate. In this paper, we characterize the relationship between learning rate and sample complexity for a general class of gradient-based algorithms, and demonstrate a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of online SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.

Student Paper: Yes
Submission Number: 42
Loading