Keywords: Loss landscape, Flat minima, Generalization, Scalar curvature, Functional dimension, Optimization geometry, Implicit regularization, Riemannian Geometry
TL;DR: In this work, we will derive a novel bound of the scalar curvature in terms of the functional dimension and the eigenvalues of the Hessian of the loss.
Abstract: What does it mean for a loss surface to be flat? A \emph{flat} minimum is one where the loss increases slowly in many directions around the optimum. Intuitively, a flat basin gives room for parameter perturbations without harming performance, which suggests robustness to noise, and potentially better generalisation.
A common quantitative measure is the trace of the Hessian, which is the sum of its eigenvalues. Intuitively, large eigenvalues correspond to steep curvature in some directions. Hence, penalising or bounding those helps in finding flatter minima.
Alternatively, the scalar curvature of the loss has been suggested as a measure of flatness, carrying a more geometric flavour.
Instead of only measuring individual directions, it combines curvature across two-dimensional planes (sectional curvatures) into a scalar at each point in the parameter space. In that sense, it is more geometrically meaningful and possibly more robust to coordinate changes.
In this work, we will derive a novel bound of the scalar curvature in terms of the functional dimension and the eigenvalues. For coherence with related work, we only consider architectures with ReLU activations. The results can easily be stated more generally though.
Serve As Reviewer: ~Georgios_Arvanitidis1
Submission Number: 35
Loading