The Curious Case of AdamW

ICLR 2026 Conference Submission21688 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AdamW, optimization, convergence, dynamical systems, stability analysis
TL;DR: AdamW’s fixed points match a regularized implicit objective, but many are unstable. This misalignment makes AdamW non-convergent on simple functions.
Abstract: AdamW is ubiquitous in deep learning, yet its behavior remains poorly understood. We analyze its dynamics through the lens of dynamical systems and show that AdamW admits an *implicit objective*: its fixed points coincide with the stationary points of a constrained and regularized optimization problem. However, not all of these fixed points are stable under AdamW’s dynamics, and stability depends sensitively on curvature, weight decay, and momentum parameters. Even in simple one-dimensional settings, AdamW can exhibit surprisingly complex behavior: equilibria may be unstable and trajectories can fall into persistent limit cycles. We further extend the analysis to higher dimensions, deriving sufficient conditions for stability, and validate empirically that when AdamW converges in neural network training, it converges to stable equilibria. These results clarify what optimization problem AdamW is associated with, when convergence can be expected, and how its curious dynamics could inspire the development of more reliable optimization algorithms in the future.
Primary Area: learning theory
Submission Number: 21688
Loading