Keywords: deep learning, generalization, overparameterization, neural tangent kernel
Abstract: It has been recognized that a heavily overparameterized artificial neural network exhibits surprisingly good generalization performance in various machine-learning tasks. Recent theoretical studies have made attempts to unveil the mystery of overparameterization. In most of those previous works, overparameterization is achieved by increasing the width of the network, while the effect of the depth of the network has remained less well understood. In this work, we investigate the effect of the depth within an overparameterized regime for fully connected neural networks. To gain an insight into the advantage of depth, we introduce local and global labels according to abstract but simple classification rules. It turns out that the locality of a relevant feature for a given classification rule plays a key role; our experimental results suggest that deeper is better for local labels, whereas shallower is better for global labels. We also compare the results of finite networks with those of the neural tangent kernel (NTK), and find that the NTK does not correctly capture the depth dependence of the generalization performance, which indicates the importance of the feature learning rather than the lazy learning.
One-sentence Summary: It depends on locality of relevant features whether the depth is beneficial in deep learning for classification tasks.
6 Replies
Loading