Keywords: Optimization, Deep Learnin
Abstract: The Adam optimizer remains the default choice in deep learning, offering reliable performance across diverse architectures and tasks.
In this work, we reinterpret Adam from a signal-processing perspective—viewing its gradient update as a momentum estimate normalized by noise amplitude—and propose a simple modification: replacing the second raw moment with the second central moment (variance).
We show that centering provides a more accurate estimate of noise amplitude, allowing the optimizer to normalize the impact of gradient noise uniformly across the loss landscape and to dynamically scale momentum elements according to their signal-to-noise ratio.
Empirically, this modification yields consistent performance gains over Adam and its variants across multiple learning paradigms and neural network architectures, including reinforcement learning and sequence modeling.
Notably, on reinforcement learning benchmarks such as MuJoCo, our centered variant called “Adam+” achieves faster convergence and improved stability compared to Adam, which remains the gold standard in settings characterized by non-stationarity and the absence of reliable learning rate schedules.
Primary Area: optimization
Submission Number: 19708
Loading