Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Bartlomiej Polaczyk, Jacek Cyranka

Published: 01 Jan 2023, Last Modified: 13 Jun 2023Trans. Mach. Learn. Res. 2023Readers: Everyone

Abstract: We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks equipped with ReLU activation function. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network.

0 Replies