Escaping Flat Areas via Function-Preserving Structural Network Modifications

Yannic Kilcher; Gary Bécigneul; Thomas Hofmann

Escaping Flat Areas via Function-Preserving Structural Network Modifications

Yannic Kilcher, Gary Bécigneul, Thomas Hofmann

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Hierarchically embedding smaller networks in larger networks, e.g.~by increasing the number of hidden units, has been studied since the 1990s. The main interest was in understanding possible redundancies in the parameterization, as well as in studying how such embeddings affect critical points. We take these results as a point of departure to devise a novel strategy for escaping from flat regions of the error surface and to address the slow-down of gradient-based methods experienced in plateaus of saddle points. The idea is to expand the dimensionality of a network in a way that guarantees the existence of new escape directions. We call this operation the opening of a tunnel. One may then continue with the larger network either temporarily, i.e.~closing the tunnel later, or permanently, i.e.~iteratively growing the network, whenever needed. We develop our method for fully-connected as well as convolutional layers. Moreover, we present a practical version of our algorithm that requires no network structure modification and can be deployed as plug-and-play into any current deep learning framework. Experimentally, our method shows significant speed-ups.

Keywords: deep learning, cnn, structural modification, optimization, saddle point

TL;DR: If optimization gets stuck in a saddle, we add a filter to a CNN in a specific way in order to escape the saddle.

6 Replies

Loading