Momentum Doesn't Change The Implicit Bias

Bohan Wang; Qi Meng; Huishuai Zhang; Ruoyu Sun; Wei Chen; Zhi-Ming Ma

Momentum Doesn't Change The Implicit Bias

Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Momentum-based Optimizers, Convergence Analysis, Implicit Bias

Abstract: The momentum acceleration technique is widely adopted in many optimization algorithms. However, the theoretical understanding on how the momentum affects the generalization performance of the optimization algorithms is still unknown. In this paper, we answer this question through analyzing the implicit bias of momentum-based optimization. We prove that both SGD with momentum and Adam converge to the $L_2$ max-margin solution for exponential-tailed loss, which is the same as vanilla gradient descent. That means, these optimizers with momentum acceleration still converge to a model with low complexity, which provides guarantees on their generalization. Technically, to overcome the difficulty brought by the error accumulation in analyzing the momentum, we construct new Lyapunov functions as a tool to analyze the gap between the model parameter and the max-margin solution.

One-sentence Summary: This paper provides the first analysis of implicit bias of momentum-based optimizers on the linear classification model with exponential-tailed loss.

17 Replies

Loading