Amortized Nesterov's Momentum: Robust and Lightweight  Momentum for Deep Learning

Kaiwen Zhou; Yanghua Jin; Qinghua Ding; James Cheng

Amortized Nesterov's Momentum: Robust and Lightweight Momentum for Deep Learning

Kaiwen Zhou, Yanghua Jin, Qinghua Ding, James Cheng

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Amortizing Nesterov's momentum for more robust, lightweight and fast deep learning training.

Abstract: Stochastic Gradient Descent (SGD) with Nesterov's momentum is a widely used optimizer in deep learning, which is observed to have excellent generalization performance. However, due to the large stochasticity, SGD with Nesterov's momentum is not robust, i.e., its performance may deviate significantly from the expectation. In this work, we propose Amortized Nesterov's Momentum, a special variant of Nesterov's momentum which has more robust iterates, faster convergence in the early stage and higher efficiency. Our experimental results show that this new momentum achieves similar (sometimes better) generalization performance with little-to-no tuning. In the convex case, we provide optimal convergence rates for our new methods and discuss how the theorems explain the empirical results.

Code: https://gofile.io/?c=e26cUT

Keywords: momentum, nesterov, optimization, deep learning, neural networks

Original Pdf: pdf

14 Replies

Loading