Keywords: Goldilocks Zone, Initialization, Trainability, Curvature Analysis, Architectural Inhomogeneity
TL;DR: We analyze the Goldilocks zone in inhomogeneous networks and show that softmax temperature scaling provides a more robust probe of curvature and trainability than weight scaling.
Abstract: We investigate how architectural inhomogeneities, such as biases, layer normalization, and residual connections, affect the curvature of the loss landscape at initialization and its link to trainability. We focus on the Goldilocks zone, a region in parameter space with excess positive curvature, previously associated with improved optimization in homogeneous networks. To extend this analysis, we compare two scaling strategies: weight scaling and softmax temperature scaling.
Our results show that in networks with biases or residual connections, both strategies identify a Goldilocks zone aligned with better training. In contrast, layer normalization leads to lower or negative curvature, yet stable optimization, revealing a disconnect between curvature and trainability. Softmax temperature scaling behaves more consistently across models, making it a more robust probe. Overall, the Goldilocks zone remains relevant in inhomogeneous networks, but its geometry and predictive power depend on architectural choices, particularly normalization.
Student Paper: Yes
Submission Number: 61
Loading