Sharpness-Aware Lookahead for Accelerating Convergence and Improving Generalization

Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong

Published: 01 Jan 2024, Last Modified: 13 May 2025IEEE Trans. Pattern Anal. Mach. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.