Escaping mediocrity: how two-layer networks learn hard generalized linear models

Luca Arnaboldi; Florent Krzakala; Bruno Loureiro; Ludovic Stephan

Escaping mediocrity: how two-layer networks learn hard generalized linear models

Luca Arnaboldi, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop OralEveryoneRevisionsBibTeX

Keywords: two-layer, neural network, single index model, SGD, high dimension

TL;DR: We explore sample complexity for two-layer nets to learn a single-index target under SGD, showing that overparameterization can only enhance convergence by a constant factor, and the role of stochasticity may be minimal in this scenario.

Abstract: This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d\log d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

Submission Number: 50

Loading