Abstract: In recent research aimed at understanding the strong generalization performance of simple gradient-based methods on overparameterized models, it has been demonstrated that when training a linear predictor on separable data with an exponentially-tailed loss function, the predictor converges towards the max-margin classifier direction, explaining its resistance to overfitting asymptotically. Moreover, recent findings have shown that overfitting is not a concern even in finite-time scenarios (non-asymptotically), as finite-time generalization bounds have been derived for gradient flow, gradient descent (GD), and stochastic GD. In this work, we extend this line of research and obtain new finite-time generalization bounds for other popular first-order methods, namely normalized GD and Nesterov’s accelerated GD. Our results reveal that these methods, as they converge more rapidly in terms of training loss, also exhibit enhanced generalization performance in terms of test error.
Loading