Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
No Spurious Local Minima in a Two Hidden Unit ReLU Network
Chenwei Wu, Jiajun Luo, Jason D. Lee
Feb 12, 2018 (modified: Feb 15, 2018)ICLR 2018 Workshop Submissionreaders: everyoneShow Bibtex
Abstract:Deep learning models can be efficiently optimized via stochastic gradient descent, but there is little theoretical evidence to support this. A key question in optimization is to understand when the optimization landscape of a neural network is amenable to gradient-based optimization. We focus on a simple neural network two-layer ReLU network with two hidden units, and show that all local minimizers are global. This combined with recent work of Lee et al. (2017); Lee et al. (2016) show that gradient descent converges to the global minimizer.
TL;DR:Recovery guarantee of stochastic gradient descent with random initialization for learning a two-layer neural network with two hidden nodes, unit-norm weights, ReLU activation functions and Gaussian inputs.
Keywords:Non-convex optimization, Deep Learning
Enter your feedback below and we'll get back to you as soon as possible.