Analysing feature learning of gradient descent using periodic functions

Published: 16 Jun 2024, Last Modified: 15 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Feature Learning, Curse of Dimensionality
TL;DR: We show that single gradient update mitigates the curse of dimensionality via feature learning.
Abstract: We present the analysis of feature learning in neural networks when target functions are defined by periodic functions applied to one-dimensional projections of the input. Previously, \citet{alexandru2022neural} considered a similar question for target functions of the form $f^*(x) = p^*(\langle u_1,x\rangle,\ldots,\langle u_r,x\rangle)$ for some vectors $u_1,\ldots,u_r \in \mathbb{R}^d$ and polynomial $p^*$, and proved that feature learning occurs during the training of a shallow neural network, even when the first-layer weights of the network are updated only once during training. Here feature learning refers to a subset of the first-layer weights $w_1,\ldots,w_m \in \mathbb{R}^d$ of the trained network being in the same directions as $\{u_1,\ldots,u_r\}$. We show that for periodic target functions, the same single gradient-based update of the first-layer weights induces feature learning of a shallow neural network, despite the additional challenge that feature learning for periodic functions now involves both directions and magnitudes of $\{u_1,\ldots,u_r\}$: a useful feature of, say, $f^*(x) = \sin(\langle u,x\rangle)$ is a vector $w \in \mathbb{R}^d$ such that $\angle(w,u) \approx 0$ and $\|w\| \approx \|u\|$. Our theoretical result shows that the sample complexity for learning a periodic target function of limited form using a shallow neural network grows polynomially with the input dimension, due to feature learning of the gradient-based training. Experimental results further support our theoretical finding, and illustrate the benefits of feature learning for a broader class of periodic target functions.
Student Paper: Yes
Submission Number: 52
Loading