Keywords: Deep learning, Symmetry breaking, loss landscapes, Hessian spectrum, bulk-outlier spectrum, curvature
TL;DR: We demonstrate in three-layer ReLU networks that key Hessian- and Gauss–Newton–related phenomena in deep learning can be explained by symmetry breaking.
Abstract: We propose symmetry breaking as a unifying principle underlying geometric and optimization phenomena in the training of fully connected three-layer networks. First, we demonstrate the prevalence of critical points that break symmetries jointly induced by the loss, network architecture, and data distribution, in direct agreement with theoretical predictions. Group-theoretic results, seemingly far removed, are then shown to govern the structure of the Hessian and Gauss–Newton matrices, with empirical phenomena characteristic of deep learning—such as the bulk-and-outliers spectrum and optimization trajectories concentrating in low-dimensional subspaces—emerging naturally as manifestations of symmetry breaking. Leveraging the rich symmetry structure, we employ group representation-theoretic techniques to derive sharp estimates of the eigenspectrum in high dimensions, requiring only a small, fixed subset of Hessian entries. The analysis reveals notable curvature differences between local and global minima, contrary to the analogous two-layer setting, which point to a possible dependence of the flat minima conjecture on network depth.
Primary Area: optimization
Submission Number: 12704
Loading