Fast Test Error Rates for Gradient-Based Algorithms on Separable Data

Puneesh Deora; Bhavya Vasudeva; Vatsal Sharan; Christos Thrampoulidis

Fast Test Error Rates for Gradient-Based Algorithms on Separable Data

Puneesh Deora, Bhavya Vasudeva, Vatsal Sharan, Christos Thrampoulidis

Published: 01 Jan 2024, Last Modified: 07 Oct 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent research aimed at understanding the strong generalization performance of simple gradient-based methods on overparameterized models, it has been demonstrated that when training a linear predictor on separable data with an exponentially-tailed loss function, the predictor converges towards the max-margin classifier direction, explaining its resistance to overfitting asymptotically. Moreover, recent findings have shown that overfitting is not a concern even in finite-time scenarios (non-asymptotically), as finite-time generalization bounds have been derived for gradient flow, gradient descent (GD), and stochastic GD. In this work, we extend this line of research and obtain new finite-time generalization bounds for other popular first-order methods, namely normalized GD and Nesterov’s accelerated GD. Our results reveal that these methods, as they converge more rapidly in terms of training loss, also exhibit enhanced generalization performance in terms of test error.

Loading