Escaping Flat Areas via Function-Preserving Structural Network ModificationsDownload PDF

27 Sep 2018 (modified: 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
  • Abstract: Hierarchically embedding smaller networks in larger networks, e.g.~by increasing the number of hidden units, has been studied since the 1990s. The main interest was in understanding possible redundancies in the parameterization, as well as in studying how such embeddings affect critical points. We take these results as a point of departure to devise a novel strategy for escaping from flat regions of the error surface and to address the slow-down of gradient-based methods experienced in plateaus of saddle points. The idea is to expand the dimensionality of a network in a way that guarantees the existence of new escape directions. We call this operation the opening of a tunnel. One may then continue with the larger network either temporarily, i.e.~closing the tunnel later, or permanently, i.e.~iteratively growing the network, whenever needed. We develop our method for fully-connected as well as convolutional layers. Moreover, we present a practical version of our algorithm that requires no network structure modification and can be deployed as plug-and-play into any current deep learning framework. Experimentally, our method shows significant speed-ups.
  • Keywords: deep learning, cnn, structural modification, optimization, saddle point
  • TL;DR: If optimization gets stuck in a saddle, we add a filter to a CNN in a specific way in order to escape the saddle.
6 Replies