Over-parameterization Improves Generalization in the XOR Detection Problem

Sep 27, 2018 ICLR 2019 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Empirical evidence suggests that neural networks with ReLU activations generalize better with over-parameterization. However, there is currently no theoretical analysis that explains this observation. In this work, we study a simplified learning task with over-parameterized convolutional networks that empirically exhibits the same qualitative phenomenon. For this setting, we provide a theoretical analysis of the optimization and generalization performance of gradient descent. Specifically, we prove data-dependent sample complexity bounds which show that over-parameterization improves the generalization performance of gradient descent.
  • Keywords: deep learning, theory, non convex optimization, over-parameterization
  • TL;DR: We show in a simplified learning task that over-parameterization improves generalization of a convnet that is trained with gradient descent.
0 Replies