Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ 	Global Convergence Guarantees and Feature Learning

Fadhel Ayed; Francois Caron; Paul Jung; Juho Lee; Hoil Lee; Hongseok Yang

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning

Fadhel Ayed, Francois Caron, Paul Jung, Juho Lee, Hoil Lee, Hongseok Yang

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX

Keywords: overparameterization, gradient descent, gradient flow, shallow neural network, node scaling, global convergence

TL;DR: We show that training very large shallow neural networks with additional parameters associated to each node of the hidden layer, converges to a global minimum.

Abstract: We consider gradient-based optimisation of wide, shallow neural networks with hidden-node ouputs scaled by positive scale parameters. The scale parameters are non-identical, differing from classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.

Submission Number: 42

Loading