Keywords: gradient regularization;central-difference;momentum lookahead;variance reduction
TL;DR: We introduce CMLR, an optimizer that realizes momentum lookahead through central-difference probing, theoretically reducing variance and empirically improving generalization across diverse architectures at negligible extra cost.
Abstract: Sharpness-Aware Minimization (SAM) is an effective technique for improving generalization by guiding optimizers towards flat minima through parameter perturbations. However, extending such regularization strategies to multi-step settings often leads to instability, where naive iterative updates degrade rather than enhance generalization. To overcome this limitation, we propose Central-difference Momentum Lookahead Regularization (CMLR), a framework that performs momentum lookahead through central-difference probing of the loss landscape. By constructing the perturbation direction from symmetric gradient evaluations, CMLR realizes a momentum lookahead update that is inherently more robust and exhibits reduced variance, while requiring no additional gradient evaluations. This design ensures smooth optimization trajectories and reliable improvements at low computational cost. We establish formal convergence guarantees together with a variance reduction analysis for CMLR, and empirically demonstrate that it consistently improves generalization across diverse architectures and datasets.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 17648
Loading