Make Deep Networks Shallow Again

Bernhard Bermeitinger; Tomas Hrycej; Siegfried Handschuh

Make Deep Networks Shallow Again

Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh

Published: 01 Jan 2023, Last Modified: 22 Jan 2025KDIR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. An important special case of network architecture, frequently used in computer vision applications, consists of using a stack of layers of the same dimension. For this architecture, a breakthrough has been achieved by the concept of residual connections—an identity mapping parallel to a conventional layer. This concept substantially alleviates the vanishing gradient problem and is thus widely used. The focus of this paper is to show the possibility of substituting the deep stack of residual layers with a shallow architecture with comparable expressive power and similarly good convergence properties. A stack of residual layers can be expressed as an expansion of terms similar to the Taylor expansion. This ex

Loading