Depth separation and weight-width trade-offs for sigmoidal neural networks

Amit Deshpande; Navin Goyal; Sushrut Karmalkar

Depth separation and weight-width trade-offs for sigmoidal neural networks

Amit Deshpande, Navin Goyal, Sushrut Karmalkar

12 Feb 2018 (modified: 15 Feb 2018)ICLR 2018 Workshop SubmissionReaders: Everyone

Keywords: depth separation, sigmoidal neural networks, low-degree polynomials

TL;DR: Depth-2 vs depth-3 separation in L_2-norm for sigmoidal neural networks over a large class of distributions

Abstract: Recent work has shown strong separation between the expressive power of depth-$2$ and depth-$3$ neural networks. These separation results exhibit a function and an input distributions, so that the function is well-approximable in $L_{2}$-norm on the input distribution by a depth-$3$ neural network of polynomial size but any depth-$2$ neural network that well-approximates it requires exponential size. A limitations of these results is that they work only for certain careful choices of functions and input distributions that are arguably not natural enough. We provide a simple proof of $L_{2}$-norm separation between the expressive power of depth-$2$ and depth-$3$ sigmoidal neural networks for a large class of input distributions, assuming their weights are polynomially bounded. Our proof is simpler than previous results, uses known low-degree multivariate polynomial approximations to neural networks, and gives the first depth-$2$-vs-depth-$3$ separation that works for a large class of input distributions.

3 Replies

Loading