Keywords: loss topography, second-order methods, curvature, gradient discontinuites, gradient glass
TL;DR: We show that loss landscapes in deep nets have curvature beyond the Hessian, derive optimal estimators for these glass-like terms, and validate their role with Alice, a lightweight diagnostic probe.
Abstract: Second-order methods seek to exploit loss curvature, but in deep networks the Hessian often fails to approximate it well, especially near sharp gradient transitions induced by common activation functions.
We introduce an analytic framework that characterizes curvature of expectation, showing how such transitions generate pseudorandom gradient perturbations that combine into a glass-like structure, analogous to amorphous solids.
From this perspective we derive: (i) the density of gradient variations and bounds on expected loss changes, (ii) optimal kernels and sampling schemes to estimate both Hessian and glass curvature from ordinary gradients, and (iii) quasi-Newton updates that unify these curvature terms with exactness conditions under Nesterov acceleration.
To probe their empirical role, we implement Alice, a lightweight diagnostic that inserts curvature estimates into controlled updates, revealing which terms genuinely influence optimization.
In this way, our results support further optimization research: they introduce a new theoretical picture of nonsmooth loss landscapes that can catalyze future advances in pruning, quantization, and curvature-aware training.
Primary Area: optimization
Submission Number: 16597
Loading