- TL;DR: We propose an efficient and effective step size adaptation method for the gradient methods.
- Abstract: This paper proposes a new approach for step size adaptation in gradient methods. The proposed method called step size optimization (SSO) formulates the step size adaptation as an optimization problem which minimizes the loss function with respect to the step size for the given model parameters and gradients. Then, the step size is optimized based on alternating direction method of multipliers (ADMM). SSO does not require the second-order information or any probabilistic models for adapting the step size, so it is efficient and easy to implement. Furthermore, we also introduce stochastic SSO for stochastic learning environments. In the experiments, we integrated SSO to vanilla SGD and Adam, and they outperformed state-of-the-art adaptive gradient methods including RMSProp, Adam, L4-Adam, and AdaBound on extensive benchmark datasets.
- Keywords: Deep Learning, Step Size Adaptation, Nonconvex Optimization
- Original Pdf: pdf