Where Do Large Learning Rates Lead Us? A Feature Learning Perspective

Published: 16 Jun 2024, Last Modified: 20 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: learning rate, neural networks, feature learning
TL;DR: We study feature learning properties of training with different initial LRs.
Abstract: It is a conventional wisdom that using large learning rates (LRs) early in training improves generalization. Following a line of research devoted to understanding this effect mechanistically, we conduct an empirical study in a controlled setting focusing on the feature learning properties of training with different initial LRs. We show that the range of initial LRs providing the best generalization of the final solution results in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, training starting with too small LRs attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to extract meaningful patterns from the data.
Student Paper: Yes
Submission Number: 30
Loading