- Abstract: We consider new variants of optimization algorithms. Our algorithms are based on the observation that mini-batch of stochastic gradients in consecutive iterations do not change drastically and consequently may be predictable. Inspired by the similar setting in online learning literature called Optimistic Online learning, we propose two new optimistic algorithms for AMSGrad and Adam, respectively, by exploiting the predictability of gradients. The new algorithms combine the idea of momentum method, adaptive gradient method, and algorithms in Optimistic Online learning, which leads to speed up in training deep neural nets in practice.
- Keywords: optimization, Adam, AMSGrad
- TL;DR: We consider new variants of optimization algorithms for training deep nets.