On the Convergence of Gradient Flow on Multi-layer Linear Models

Hancheng Min; Rene Vidal; Enrique Mallada

On the Convergence of Gradient Flow on Multi-layer Linear Models

Hancheng Min, Rene Vidal, Enrique Mallada

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Multi-layer Linear Networks, Non-convex optimization, Gradient Flow, Training invariance

TL;DR: We study how initialization affect the convergence of gradient flow on multi-layer linear networks

Abstract: In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2\cdots W_L)$. We show that when $f$ satisfies the gradient dominance property, proper weight initialization leads to exponential convergence of the gradient flow to a global minimum of the loss. Moreover, the convergence rate depends on two trajectory-specific quantities that are controlled by the weight initialization: the \emph{imbalance matrices}, which measure the difference between the weights of adjacent layers, and the least singular value of the \emph{weight product} $W=W_1W_2\cdots W_L$. Our analysis provides improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the rate of convergence. Our results apply to most regression losses and extend to classification ones.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

8 Replies

Loading