Track: tiny paper (up to 4 pages)
Keywords: optimization, parameter symmetry, equivariance, Gauss-Newton, Whitening
TL;DR: Symmetry Equivariant Preconditioners can be written as pullbacks, with Guass-Newton falling out as a symmetry commuting preconditioner that has good "conditioning". Whitening exhibits structured symmetry breaking compared to gradient descent.
Abstract: Neural networks have architecture dependent parameter space symmetries where distinct parameters realize the same function. If an optimizer does not behave consistently under symmetry transformations, optimization dynamics depend on the arbitrary choice of representative along symmetry orbits. Recent work characterizes symmetries via the Jacobian of the evaluation map, decomposing parameter space into a \emph{functional subspace} affecting outputs and a \emph{fiber} subspace of symmetry directions. We show Gauss-Newton (GN) is symmetry-equivariant and that symmetry equivariance forces the pullback form $P = J^\top B J$ within a natural class of preconditioners. We then show that GN uniquely achieves optimal convergence conditioning within this class, while whitening achieves optimal whitening conditioning (isotropy of updates) but exhibits structured symmetry breaking due to the square root. These results clarify the relationship between Gauss-Newton, whitening and symmetries, which could be especially valuable for overparameterized networks.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 125
Loading