Geometry & Optimization of Three-Layer Networks: Symmetry Breaking as a Unifying Principle

Yossi Arjevani

Geometry & Optimization of Three-Layer Networks: Symmetry Breaking as a Unifying Principle

Yossi Arjevani

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep learning, Symmetry breaking, loss landscapes, Hessian spectrum, bulk-outlier spectrum, curvature

TL;DR: We demonstrate in three-layer ReLU networks that key Hessian- and Gauss–Newton–related phenomena in deep learning can be explained by symmetry breaking.

Abstract: We propose symmetry breaking as a unifying principle underlying geometric and optimization phenomena in the training of fully connected three-layer networks. First, we demonstrate the prevalence of critical points that break symmetries jointly induced by the loss, network architecture, and data distribution, in direct agreement with theoretical predictions. Group-theoretic results, seemingly far removed, are then shown to govern the structure of the Hessian and Gauss–Newton matrices, with empirical phenomena characteristic of deep learning—such as the bulk-and-outliers spectrum and optimization trajectories concentrating in low-dimensional subspaces—emerging naturally as manifestations of symmetry breaking. Leveraging the rich symmetry structure, we employ group representation-theoretic techniques to derive sharp estimates of the eigenspectrum in high dimensions, requiring only a small, fixed subset of Hessian entries. The analysis reveals notable curvature differences between local and global minima, contrary to the analogous two-layer setting, which point to a possible dependence of the flat minima conjecture on network depth.

Primary Area: optimization

Submission Number: 12704

Loading