Keywords: Online learning, regret, Adam, nonconvex optimization, changing environments
Abstract: Adaptive optimizers, most notably Adam (and AdamW), are ubiquitous in large-scale first-order training. Yet many theoretical treatments either omit or distort the very features that drive Adam's empirical success, such as momentum and bias correction. Building on recent online-to-nonconvex reductions, we develop a refined discounted-to-nonconvex analysis that respects these features and yields guarantees under a statistically grounded setting. Our key technical contributions are twofold. First, we formalize an online learning with shifting stochastic environments framework that aligns with non-smooth, non-convex stochastic optimization and sharpens how discounted regret translates to optimality conditions. Second, we introduce Adam-FTRL, an online algorithm that exactly matches the plain Adam update in vector form, and prove competitive discounted regret bounds without clipping or unrealistic parameter couplings. Via our conversion, these bounds imply robust convergence of Adam-FTRL to $(\lambda,\rho)$-stationary points, achieving the optimal iteration complexity under favorable environmental stochasticity and shift complexity. The analysis further highlights two environment measures: a normalized signal-to-noise ratio (NSNR) and a discounted shift complexity, which govern convergence behavior and help explain the conditions under which Adam attains its theoretical guarantees.
Primary Area: optimization
Submission Number: 21021
Loading