Linearly Convergent Algorithms for Learning Shallow Residual NetworksDownload PDFOpen Website

2019 (modified: 27 Sept 2024)ISIT 2019Readers: Everyone
Abstract: We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks - gradient descent and alternating minimization - and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the d-dimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple “identity” initialization suffices. Furthermore, we provide statistical upper bounds which indicate that n = O(d <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ) suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.
0 Replies

Loading