Incorporating Nesterov Momentum into Adam

Timothy Dozat

Incorporating Nesterov Momentum into Adam

Timothy Dozat

21 Jul 2025 (modified: 18 Feb 2016)ICLR 2016Readers: Everyone

Abstract: This work aims to improve upon the recently proposed and rapidly popular- ized optimization algorithm Adam (Kingma & Ba, 2014). Adam has two main components—a momentum component and an adaptive learning rate component. However, regular momentum can be shown conceptually and empirically to be in- ferior to a similar algorithm known as Nesterov’s accelerated gradient (NAG). We show how to modify Adam’s momentum component to take advantage of insights from NAG, and then we present preliminary evidence suggesting that making this substitution improves the speed of convergence and the quality of the learned mod- els.

Conflicts: stanford.edu

2 Replies

Loading