- Keywords: deep learning, universal approximation, deep narrow networks
- Abstract: The classical Universal Approximation Theorem certifies that the universal approximation property holds for the class of neural networks of arbitrary width. Here we consider the natural `dual' theorem for width-bounded networks of arbitrary depth. Precisely, let $n$ be the number of inputs neurons, $m$ be the number of output neurons, and let $\rho$ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width $n + m + 2$, and activation function $\rho$, exhibits the universal approximation property with respect to the uniform norm on compact subsets of $\mathbb{R}^n$. This covers every activation function possible to use in practice; in particular this includes polynomial activation functions, making this genuinely different to the classical case. We go on to consider extensions of this result. First we show an analogous result for a certain class of nowhere differentiable activation functions. Second we establish an analogous result for noncompact domains, by showing that deep narrow networks with the ReLU activation function exhibit the universal approximation property with respect to the $p$-norm on $\mathbb{R}^n$. Finally we show that width of only $n + m + 1$ suffices for `most' activation functions.
- Original Pdf: pdf
0 Replies