Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Published: 15 Mar 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks equipped with ReLU activation function. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Add camera ready version.
Code: https://github.com/MIMUW-RL/improved_overparametrization_tmlr
Assigned Action Editor: ~Joan_Bruna1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 569
Loading