Abstract: This work aims to improve upon the recently proposed and rapidly popular-
ized optimization algorithm Adam (Kingma & Ba, 2014). Adam has two main
components—a momentum component and an adaptive learning rate component.
However, regular momentum can be shown conceptually and empirically to be in-
ferior to a similar algorithm known as Nesterov’s accelerated gradient (NAG). We
show how to modify Adam’s momentum component to take advantage of insights
from NAG, and then we present preliminary evidence suggesting that making this
substitution improves the speed of convergence and the quality of the learned mod-
els.
Conflicts: stanford.edu
2 Replies
Loading