Abstract: Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we study a simplified learning task with over-parameterized convolutional networks that empirically exhibits the same qualitative phenomenon. For this setting, we provide a theoretical analysis of the optimization and generalization performance of gradient descent. Specifically, we prove data-dependent sample complexity bounds which show that over-parameterization improves the generalization performance of gradient descent.
Keywords: deep learning, theory, non convex optimization, over-parameterization
TL;DR: We show in a simplified learning task that over-parameterization improves generalization of a convnet that is trained with gradient descent.
9 Replies
Loading