Keywords: stochastic optimization, energy stability, momentum
Abstract: In this paper, we propose SGDEM, Stochastic Gradient Descent with Energy and Momentum to solve a large class of general nonconvex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGDEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGDEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGDEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.
One-sentence Summary: We propose SGDEM, Stochastic Gradient Descent with Energy and Momentum to solve a large class of general nonconvex stochastic optimization problems.
11 Replies
Loading