Approximation and non-parametric estimation of ResNet-type convolutional neural networks via block-sparse fully-connected neural networks

Kenta Oono; Taiji Suzuki

Approximation and non-parametric estimation of ResNet-type convolutional neural networks via block-sparse fully-connected neural networks

Kenta Oono, Taiji Suzuki

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We develop new approximation and statistical learning theories of convolutional neural networks (CNNs) via the ResNet-type structure where the channel size, filter size, and width are fixed. It is shown that a ResNet-type CNN is a universal approximator and its expression ability is no worse than fully-connected neural networks (FNNs) with a \textit{block-sparse} structure even if the size of each layer in the CNN is fixed. Our result is general in the sense that we can automatically translate any approximation rate achieved by block-sparse FNNs into that by CNNs. Thanks to the general theory, it is shown that learning on CNNs satisfies optimality in approximation and estimation of several important function classes. As applications, we consider two types of function classes to be estimated: the Barron class and H\"older class. We prove the clipped empirical risk minimization (ERM) estimator can achieve the same rate as FNNs even the channel size, filter size, and width of CNNs are constant with respect to the sample size. This is minimax optimal (up to logarithmic factors) for the H\"older class. Our proof is based on sophisticated evaluations of the covering number of CNNs and the non-trivial parameter rescaling technique to control the Lipschitz constant of CNNs to be constructed.

Keywords: CNN, ResNet, learning theory, approximation theory, non-parametric estimation, block-sparse

TL;DR: It is shown that ResNet-type CNNs are a universal approximator and its expression ability is not worse than fully connected neural networks (FNNs) with a \textit{block-sparse} structure even if the size of each layer in the CNN is fixed.

10 Replies

Loading