STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

Huishuai Zhang; Da Yu; Mingyang Yi; Wei Chen; Tie-yan Liu

STABILITY AND CONVERGENCE THEORY FOR LEARNING RESNET: A FULL CHARACTERIZATION

Huishuai Zhang, Da Yu, Mingyang Yi, Wei Chen, Tie-yan Liu

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We characterize the stability and convergence of gradient descent learning ResNet, unveiling the theorectical and practical importance of tau =1/sqrt(L) in the residual block.

Abstract: ResNet structure has achieved great success since its debut. In this paper, we study the stability of learning ResNet. Specifically, we consider the ResNet block $h_l = \phi(h_{l-1}+\tau\cdot g(h_{l-1}))$ where $\phi(\cdot)$ is ReLU activation and $\tau$ is a scalar. We show that for standard initialization used in practice, $\tau =1/\Omega(\sqrt{L})$ is a sharp value in characterizing the stability of forward/backward process of ResNet, where $L$ is the number of residual blocks. Specifically, stability is guaranteed for $\tau\le 1/\Omega(\sqrt{L})$ while conversely forward process explodes when $\tau>L^{-\frac{1}{2}+c}$ for a positive constant $c$. Moreover, if ResNet is properly over-parameterized, we show for $\tau \le 1/\tilde{\Omega}(\sqrt{L})$ gradient descent is guaranteed to find the global minima \footnote{We use $\tilde{\Omega}(\cdot)$ to hide logarithmic factor.}, which significantly enlarges the range of $\tau\le 1/\tilde{\Omega}(L)$ that admits global convergence in previous work. We also demonstrate that the over-parameterization requirement of ResNet only weakly depends on the depth, which corroborates the advantage of ResNet over vanilla feedforward network. Empirically, with $\tau\le1/\sqrt{L}$, deep ResNet can be easily trained even without normalization layer. Moreover, adding $\tau=1/\sqrt{L}$ can also improve the performance of ResNet with normalization layer.

Keywords: ResNet, stability, convergence theory, over-parameterization

Original Pdf: pdf

11 Replies

Loading