Generalization and Optimization of SGD with Lookahead

Generalization and Optimization of SGD with Lookahead

ICLR 2026 Conference Submission16926 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Algorithmic stability, generalization analysis, Lookahead, SGD

TL;DR: We analyze Lookahead‑SGD via on‑average stability, remove global Lipschitz assumptions, and derive optimistic generalization bounds with linear minibatch speedup.

Abstract: The Lookahead optimizer enhances deep learning models by employing a dual-weight update mechanism, which has been shown to improve the performance of underlying optimizers such as SGD. However, most theoretical studies focus on its convergence on training data, leaving its generalization capabilities less understood. Existing generalization analyses are often limited by restrictive assumptions, such as requiring the loss function to be globally Lipschitz continuous, and their bounds do not fully capture the relationship between optimization and generalization. In this paper, we address these issues by conducting a rigorous stability and generalization analysis of the Lookahead optimizer with minibatch SGD. We leverage on-average model stability to derive generalization bounds for both convex and strongly convex problems without the restrictive Lipschitzness assumption. Our analysis demonstrates a linear speedup with respect to the batch size in the convex setting.

Primary Area: learning theory

Submission Number: 16926

Loading