Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Understanding Very Deep Networks via Volume Conservation
Thomas Unterthiner, Sepp Hochreiter
Feb 18, 2016 (modified: Feb 18, 2016)ICLR 2016 workshop submissionreaders: everyone
Abstract:Recently, very deep neural networks set new records across many application domains, like Residual Networks at the ImageNet challenge and Highway Networks at language processing tasks. We expect further excellent performance improvements in different fields from these very deep networks. However these networks are still poorly understood, especially since they rely on non-standard architectures.
In this contribution we analyze the learning dynamics which are required for successfully training very deep neural networks. For the analysis we use a symplectic network architecture which inherently conserves volume when mapping a representation from one to the next layer. Therefore it avoids the vanishing gradient problem, which in turn allows to effectively train thousands of layers. We consider highway and residual networks as well as the LSTM model, all of which have approximately volume conserving mappings.
We identified two important factors for making deep architectures working:
(1) (near) volume conserving mappings through $x = x + f(x)$ or similar (cf.\ avoiding the vanishing gradient);
(2) Controlling the drift effect, which increases/decreases $x$ during propagation toward the output (cf.\ avoiding bias shifts);
Enter your feedback below and we'll get back to you as soon as possible.