Understanding Local Minima in Neural Networks by Loss Surface Decomposition


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: To provide principled ways of designing proper Deep Neural Network (DNN) models, it is essential to understand the loss surface of DNNs under realistic assumptions. We introduce interesting aspects for understanding the local minima and overall structure of the loss surface. The parameter domain of the loss surface can be decomposed into regions in which activation values (zero or one for rectified linear units) are consistent. We found that, in each region, the loss surface have properties similar to that of linear neural networks where every local minimum is a global minimum. This means that every differentiable local minimum is the global minimum of the corresponding region. We prove that for a neural network with one hidden layer using rectified linear units under realistic assumptions. There are poor regions that lead to poor local minima, and we explain why such regions exist even in the overparameterized DNNs.
  • TL;DR: The loss surface of neural networks is a disjoint union of regions where every local minimum is a global minimum of the corresponding region.
  • Keywords: neural network, local minima, global minima, saddle point, optimization, loss surface, rectified linear unit, loss surface decomposition, gradient descent