Learning rate collapse prevents training recurrent neural networks at scale

Bariscan Kurtkaya; Alperen Cimen; Mehmet Harmanli; Andy Alexander; Nina Miolane; Fatih Dinc; Yucel Yemez

Learning rate collapse prevents training recurrent neural networks at scale

Bariscan Kurtkaya, Alperen Cimen, Mehmet Harmanli, Andy Alexander, Nina Miolane, Fatih Dinc, Yucel Yemez

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 ProceedingsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Recurrent neural networks, computational neuroscience

TL;DR: Large recurrent neural networks fail to train because of learning rate collapse, but low-dimensional representations can overcome this barrier.

Abstract: Recurrent neural networks (RNNs) are central to modeling neural computation in systems neuroscience, yet the principles that enable their stable and efficient training at large scales remain poorly understood. Seminal work in machine learning predicts that the effective learning rate should shrink with the size of feedforward networks, and practical heuristics such as learning rate schedulers or width-dependent rescaling have been proposed to stabilize training. Here we demonstrate that an analogous phenomenon, which we term \textit{learning rate collapse}, poses a concrete barrier for large RNNs in neuroscience-inspired short-term memory tasks. We show that the maximum trainable learning rate decreases as a power law with neuron number, forcing large networks to converge only at critically slow learning rates. While parameter rescaling with the inverse of network size mitigates instability, larger networks still learn substantially slower, indicating that collapse is a structural limitation rather than a trivial parametrization artifact. These optimization limits are further compounded by severe memory demands, which together make training large RNNs both unstable and computationally costly. As a proof of principle, we design a learning process that enforces a low-dimensional geometry in RNN representations, which reduces memory costs and mitigates learning rate collapse. These results situate learning rate collapse within a broader lineage of scaling analyses and establish it as a fundamental obstacle for training large RNNs, with potential solutions likely to come from future work that incorporates careful consideration of symmetry and geometry in neural representations.

Submission Number: 127

Loading