Does Adam Converge and When?

Anonymous

Published: 28 Mar 2022, Last Modified: 05 May 2023BT@ICLR2022Readers: Everyone
Keywords: Adam, optimization, deep learning
Abstract: In this blog post, we revisit the (non-)convergence behavior of Adam. Especially, we briefly review the non-convergence results by Reddi et al'19 and the convergence results by Shi et al.'20. Their results take important steps forward to understand Adam better. However, the convergence analysis by Shi et al.'20 requires $\beta_1$ to be either 0 or small enough ($\beta_1$ is the momentum hyperparameter in Adam). Is this a reasonable requirement? If not, how large is the gap between theory and practice? In this blog, we will discuss these questions from multiple different perspectives. We will show that the gap is actually non-negligible, and the discussion on the convergence of Adam is far from being concluded.
Submission Full: zip
Blogpost Url: yml
ICLR Paper: https://openreview.net/forum?id=ryQu7f-RZ, https://openreview.net/forum?id=3UDSdyIcBDA
4 Replies

Loading