Keywords: deep linear neural networks, non-convex optimization, gradient descent, initialization
TL;DR: In deep linear neural networks, we obtain sharp rates for gradient descent to converge to a global optimum.
Abstract: This paper provides sharp rates of convergence of the gradient descent (GD) method for deep linear neural networks with different random initialization. This study touches upon one major open theoretical problem in machine learning: why deep neural networks trained with GD methods are efficient in many practical applications. While the solution of this problem is still beyond reach for general nonlinear deep neural networks, there have been extensive efforts in the literature in studying relevant questions for deep linear neural networks and there are many interesting results in this research direction. For example, recent results on the loss landscape show that even though the loss function of deep linear neural networks is non-convex, every local minimizer is also a global minimizer. When the GD method is applied to train the deep linear networks, it has been shown in the literature that the convergence behavior of the GD method depends on the initialization. In this paper, we obtain the sharp rate of convergence of GD for deep linear networks, and we show that this rate does not depend on the types of random initialization. Furthermore, we show that the depth of the network does not affect the optimal rate of convergence, provided that the width of each hidden layer is appropriately large.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: General Machine Learning (ie none of the above)