Understanding Scaling Laws via Neural Feature Learning Dynamics

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scaling laws; Feature learning dynamics; Infinite-width and infinite-depth limit; Residual networks; Stochastic differential equations; Neural Tangent Kernel (NTK); Maximal update parameterization (μP).
TL;DR: To understand when and why neural scaling laws succeed or fail, we analyze deep ResNets trained with SGD and show that, in the joint infinite-width–depth limit, their feature evolution is governed by a coupled forward–backward SDE system.
Abstract: Recently, deep neural networks have revolutionized various domains, primarily due to their ability to consistently improve performance when scaling up resources, including model size, data, and compute, a phenomenon formalized as scaling laws. Yet, the theoretical basis of these principles remains unclear: why scaling works and when it breaks down. We address this gap by analyzing the feature learning dynamics of ResNets trained with SGD. In the joint infinite-width–depth limit, we show that feature evolution is governed by a coupled forward–backward stochastic system, which we term the \textit{neural feature learning dynamic system}. This framework clarifies the mechanisms underlying scaling laws and offers a new mathematical tool for studying deep learning dynamics.
Submission Number: 116
Loading