Incorporating Nesterov Momentum into Adam

Timothy Dozat

Feb 18, 2016 (modified: Feb 18, 2016) ICLR 2016 workshop submission readers: everyone
  • Abstract: This work aims to improve upon the recently proposed and rapidly popular- ized optimization algorithm Adam (Kingma & Ba, 2014). Adam has two main components—a momentum component and an adaptive learning rate component. However, regular momentum can be shown conceptually and empirically to be in- ferior to a similar algorithm known as Nesterov’s accelerated gradient (NAG). We show how to modify Adam’s momentum component to take advantage of insights from NAG, and then we present preliminary evidence suggesting that making this substitution improves the speed of convergence and the quality of the learned mod- els.
  • Conflicts: