Keywords: Algorithmic stability, generalization analysis, Lookahead, SGD
TL;DR: We analyze Lookahead‑SGD via on‑average stability, remove global Lipschitz assumptions, and derive optimistic generalization bounds with linear minibatch speedup.
Abstract: The Lookahead optimizer enhances deep learning models by employing a dual-weight update mechanism, which has been shown to improve the performance of underlying optimizers such as SGD. However, most theoretical studies focus on its convergence on training data, leaving its generalization capabilities less understood. Existing generalization analyses are often limited by restrictive assumptions, such as requiring the loss function to be globally Lipschitz continuous, and their bounds do not fully capture the relationship between optimization and generalization. In this paper, we address these issues by conducting a rigorous stability and generalization analysis of the Lookahead optimizer with minibatch SGD. We leverage on-average model stability to derive generalization bounds for both convex and strongly convex problems without the restrictive Lipschitzness assumption. Our analysis demonstrates a linear speedup with respect to the batch size in the convex setting.
Primary Area: learning theory
Submission Number: 16926
Loading