Abstract: We study the loss surface of neural networks that involve only rectified linear unit (ReLU) nonlinearities from a theoretical point-of-view. Any such network defines a piecewise multilinear form in parameter space. As a consequence, optima of such networks generically occur in non-differentiable regions of parameter space and so any understanding of such networks must carefully take into account their non-smooth nature. We then proceed to leverage this multilinear structure in an analysis of a neural network with one hidden-layer. Under the assumption of linearly separable data, the piecewise bilinear structure of the loss allows us to provide an explicit description of all critical points.
0 Replies
Loading