Keywords: mean-field Langevin dynamics, feature learning, multi-index models, neural networks, gradient descent
TL;DR: We provide guarantees for learning multi-index models with two-layer neural networks via the mean-field Langevin algorithm, i.e. noisy gradient descent.
Abstract: We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information exponent or the leap complexity that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario.
Student Paper: Yes
Submission Number: 53
Loading