Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minimaDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: deep learning, learning dynamics, SGD, flat minima
Abstract: We develop a quantitative theory on the escape problem of stochastic gradient descent (SGD) and investigate the effect of the sharpness of loss surfaces on escape. Deep learning has achieved tremendous success in various domains, however, it has opened up theoretical problems. For instance, it is still an ongoing question as to why an SGD can find solutions that generalize well over non-convex loss surfaces. An approach to explain this phenomenon is the escape problem, which investigates how efficiently the SGD escapes from local minima. In this paper, we develop a novel theoretical framework for the escape problem using ``quasi-potential," the notion defined in a fundamental theory of stochastic dynamical systems. We show that quasi-potential theory can handle the geometric property of loss surfaces and a covariance structure of gradient noise in a unified manner through an eigenvalue argument, while previous research studied them separately. Our theoretical results imply that sharpness contributes to slowing down escape, but the SGD’s noise structure cancels the effect, which ends up exponentially accelerating its escape. We also conduct experiments to empirically validate our theory using neural networks trained with real data.
One-sentence Summary: We develop a novel quasi-potential theory for the escape of SGD, which is more formal and flexible than existing theories.
Supplementary Material: zip
8 Replies

Loading