On feature learning in neural networks with global convergence guarantees

Zhengdao Chen; Eric Vanden-Eijnden; Joan Bruna

On feature learning in neural networks with global convergence guarantees

Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: neural networks, feature learning, gradient descent, global convergence

Abstract: We study the gradient flow optimization of over-parameterized neural networks (NNs) in a setup that allows feature learning while admitting non-asymptotic global convergence guarantees. First, we prove that for wide shallow NNs under the mean-field (MF) scaling and with a general class of activation functions, when the input dimension is at least the size of the training set, the training loss converges to zero at a linear rate under gradient flow. Building upon this analysis, we study a model of wide multi-layer NNs with random and untrained weights in earlier layers, and also prove a linear-rate convergence of the training loss to zero, regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

One-sentence Summary: Gradient flow can induce feature learning in shallow and multi-layer neural networks while admitting non-asymptotic guarantees of global convergence.

5 Replies

Loading