Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Bartłomiej Polaczyk; Jacek Cyranka

Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Bartłomiej Polaczyk, Jacek Cyranka

Published: 15 Mar 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks equipped with ReLU activation function. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Add camera ready version.

Code: https://github.com/MIMUW-RL/improved_overparametrization_tmlr

Assigned Action Editor: ~Joan_Bruna1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 569

Loading