On the number of inference regions of deep feed forward networks with piece-wise linear activations

Razvan Pascanu, Guido F. Montufar, Yoshua Bengio

Dec 23, 2013 (modified: Dec 23, 2013) ICLR 2014 conference submission readers: everyone
  • Decision: submitted, no decision
  • Abstract: This paper explores the complexity of deep feed forward networks with linear presynaptic couplings and rectified linear activations. This is a contribution to the growing body of work contrasting the representational power of deep and shallow network architectures. In particular, we offer a framework for comparing deep and shallow models that belong to the family of piece-wise linear functions based on computational geometry. We look at a deep (two hidden layers) rectifier multilayer perceptron (MLP) with linear outputs units and compare it with a single layer version of the model. In the asymptotic regime as the number of units goes to infinity, if the shallow model has $2n$ hidden units and $n_0$ inputs, then the number of linear regions is $O(n^{n_0})$. A two layer model with $n$ number of hidden units on each layer has $Omega(n^{n_0})$. We consider this as a first step towards understanding the complexity of these models and argue that better constructions in this framework might provide more accurate comparisons (especially for the interesting case of when the number of hidden layers goes to infinity).