How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective

Quan M. Nguyen

How to Set $\beta_1, \beta_2$ in Adam: An Online Learning Perspective

Quan M. Nguyen

Published: 18 Dec 2025, Last Modified: 21 Feb 2026ALT 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adam optimizer, hyperparameter tuning, discounted regret, online-to-nonconvex

Abstract: While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $\beta_1$ and $\beta_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $\beta_1 = \sqrt{\beta_2}$, which does not cover the more practical cases with $\beta_1 \neq \sqrt{\beta_2}$. We derive novel, more general analyses that hold for both $\beta_1 \geq \sqrt{\beta_2}$ and $\beta_1 \leq \sqrt{\beta_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $\beta_1 = \sqrt{\beta_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.

Submission Number: 166

Loading