Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data

Chen Fan; Reza Babanezhad Harikandeh; Christos Thrampoulidis; Mark Schmidt; Sharan Vaswani

Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data

Chen Fan, Reza Babanezhad Harikandeh, Christos Thrampoulidis, Mark Schmidt, Sharan Vaswani

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Implicit bias, line-search, Polyak step size

TL;DR: Max-margin convergence rate of gradient descent with Polyak and line-search step Sizes on separable Data

Abstract: Recent works have shown that Polyak and line-search step sizes are good for training deep neu- ral nets. However, a theoretical understanding of their generalization performances is lacking. For overparameterized models, multiple solutions can generalize differently to unseen data despite all obtaining zero train error. Given this, a natural question is whether an algorithm inherently prefers (without explicit regularization) certain simple solutions over others upon convergence-a phenomenon known as implicit bias/regularization. In this work, we characterize the implicit bias of gradient descent with Polyak and line-search step sizes in linear classification with the logis- tic or cross-entropy loss. Given these step sizes are adaptive to local smoothness of the loss, we prove that the margin of their iterates converges to the maximum $l_2$-norm margin at $\tilde{O}(\frac{1} {T})$ rate. In contrast to other adaptive step sizes that achieve the same rate [7] (also known as normalized gradient descent-NGD), line-search and Polyak step sizes do not depend on problem-specific con- stants that may not be accessible. Another subtle issue is that NGD can diverge on common losses with non-separable data, whereas line-search converges given it guarantees descent on the function value at each iteration. Finally, our analysis extends the game framework of Wang et al. [26] to logistic/cross-entropy losses.

Submission Number: 26

Loading