Keywords: Feature Learning, Curse of Dimensionality
TL;DR: We show that single gradient update mitigates the curse of dimensionality via feature learning.
Abstract: We present the analysis of feature learning in neural networks when target functions are defined by periodic functions
applied to one-dimensional projections of the input. Previously, \citet{alexandru2022neural} considered a similar question for
target functions of the form $f^*(x) = p^*(\langle u_1,x\rangle,\ldots,\langle u_r,x\rangle)$ for some
vectors $u_1,\ldots,u_r \in \mathbb{R}^d$ and polynomial $p^*$, and proved that feature learning occurs during the training of a shallow neural network,
even when the first-layer weights of the network are updated only once during training. Here feature learning refers to
a subset of the first-layer weights $w_1,\ldots,w_m \in \mathbb{R}^d$ of the trained network being in the same directions as $\{u_1,\ldots,u_r\}$.
We show that for periodic target functions, the same single gradient-based update of the first-layer weights induces feature learning
of a shallow neural network, despite the additional challenge that feature learning for periodic functions now involves
both directions and magnitudes of $\{u_1,\ldots,u_r\}$: a useful feature of, say, $f^*(x) = \sin(\langle u,x\rangle)$ is a vector $w \in \mathbb{R}^d$
such that $\angle(w,u) \approx 0$ and $\|w\| \approx \|u\|$.
Our theoretical result shows that the sample complexity for learning a periodic target function of limited form using a shallow neural network grows polynomially with the input dimension, due to feature learning of the gradient-based training.
Experimental results further support our theoretical finding, and illustrate the benefits of feature learning for a broader class of periodic target functions.
Student Paper: Yes
Submission Number: 52
Loading