Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Learning One-hidden-layer Neural Networks with Landscape Design
Rong Ge, Jason D. Lee, Tengyu Ma
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We consider the problem of learning a one-hidden-layer neural network: we assume the input x is from Gaussian distribution and the label $y = a \sigma(Bx) + \xi$, where a is a nonnegative vector and $B$ is a full-rank weight matrix, and $\xi$ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously.
Inspired by the formula, we design a non-convex objective function $G$ whose landscape is guaranteed to have the following properties:
1. All local minima of $G$ are also global minima.
2. All global minima of $G$ correspond to the ground truth parameters.
3. The value and gradient of $G$ can be estimated using samples.
With these properties, stochastic gradient descent on $G$ provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity results and validate the results by simulations.
TL;DR:The paper analyzes the optimization landscape of one-hidden-layer neural nets and designs a new objective that provably has no spurious local minimum.
Keywords:theory, non-convex optimization, loss surface
Enter your feedback below and we'll get back to you as soon as possible.