Abstract: We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure, {\it the leap}, which measures how “hierarchical” target functions are.
For uniform Boolean or isotropic Gaussian data, our main conjecture is that the time complexity for SGD to learn a function $f$ with low-dimensional support is controlled by its leap, i.e., it is $$\Tilde \Theta (d^{\max(\mathrm{Leap}(f),2)}) \,\,.$$
We prove a version of this conjecture for a specific class of functions on Gaussian isotropic data and 2-layer neural networks, under additional technical assumptions on how SGD is run. We show that the training sequentially learns the function support with a saddle-to-saddle dynamic. Our result departs from [Abbe et al.'22] by going beyond leap 1 (merged-staircase functions), and by going beyond the mean-field and gradient flow approximations that prohibit the full complexity control obtained here. Finally, we note that this gives an SGD complexity scaling that matches that of Correlational Statistical Query (CSQ) lower-bounds.
0 Replies
Loading