Revisiting the Goldilocks Zone in Inhomogeneous Networks

Published: 09 Jun 2025, Last Modified: 09 Jun 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Goldilocks Zone, Initialization, Trainability, Curvature Analysis, Architectural Inhomogeneity
TL;DR: We analyze the Goldilocks zone in inhomogeneous networks and show that softmax temperature scaling provides a more robust probe of curvature and trainability than weight scaling.
Abstract: We investigate how architectural inhomogeneities—such as biases, layer normalization, and residual connections—affect the curvature of the loss landscape at initialization and its link to trainability. We focus on the Goldilocks zone, a region in parameter space with excess positive curvature, previously associated with improved optimization in homogeneous networks. To extend this analysis, we compare two scaling strategies: weight scaling and softmax temperature scaling. Our results show that in networks with biases or residual connections, both strategies identify a Goldilocks zone aligned with better training. In contrast, layer normalization leads to lower or negative curvature, yet stable optimization—revealing a disconnect between curvature and trainability. Softmax temperature scaling behaves more consistently across models, making it a more robust probe. Overall, the Goldilocks zone remains relevant in inhomogeneous networks, but its geometry and predictive power depend on architectural choices, particularly normalization.
Student Paper: Yes
Submission Number: 61
Loading