Refresh-Scaling the Memory of Balanced Adam

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adam, momentum scaling, optimizer memory, effective learning horizon
TL;DR: Balanced Adam (with $\beta_1=\beta_2$) is more robust when $\beta$ is treated as a scale-aware memory clock and matched to the effective learning horizon via $R_\beta=(1-\beta)T_{\mathrm{ES}}\approx1000$.
Abstract: Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $\beta_1=\beta_2$, reducing the optimizer to a single remaining parameter. However, how this parameter should be set remains poorly understood. We argue that, in balanced Adam, $\beta$ should not be treated as a dimensionless constant: it defines a statistical memory horizon $H_\beta=(1-\beta)^{-1}$. In terms of the effective learning horizon $T_{\mathrm{ES}}$, estimated from the validation trajectory, we study the refresh count $R_\beta=(1-\beta)T_{\mathrm{ES}}$, which measures how many times Adam renews its internal statistics during the useful phase of training. Across 11 vision and language experiments, we find that choosing $\beta$ so that $R_\beta\approx1000$ selects different $\beta$ values depending on the training scale, yet improves robustness over the best fixed-beta baseline. Compared with the strongest fixed choice $\beta=0.944$, the refresh rule improves worst-case robustness, reducing the maximum relative gap in validation loss by 33.4\%, while bringing all 11 runs within 1\% of their validation oracle. These results suggest that the remaining hyperparameter of balanced Adam is more naturally viewed as a memory-scale variable than as a fixed constant. This provides a simple budget-aware perspective on optimizer scaling and opens a path toward treating Adam's momentum as part of the learning dynamics rather than as a static default.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 99
Loading