Learning Deep Models: Critical Points and Local Openness


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: With the increasing interest in deeper understanding of the loss surface of many non-convex deep models, this paper presents a unifying framework to study the local/global equivalence of the optimization problem arising from training of such non-convex models. Using the local openness property of the underlying training models, we provide sufficient conditions under which any local optimum of the resulting optimization problem is global. Our result unifies and extends many of the existing results in the literature. For example, our theory shows that when the input data matrix X is full row rank, all non-degenerate local optima of the optimization problem for training linear deep model with squared loss error are global minima. Moreover, for two layer linear models, we show that all degenerate critical points are either global or second order saddles and the non-degenerate local optima are global. Unlike many existing results in the literature, our result assumes no assumption on the target data matrix Y. For non-linear deep models having certain pyramidal structure with invertible activation functions, we can show global/local equivalence with no assumption on the differentiability of the activation function. Our results are the direct consequence of our main theorem that provides necessary and sufficient conditions for the matrix multiplication mapping to be locally open in its range.
  • Keywords: Deep Learning, Local and Global minima, Local Openness