Deep Learning: When Conventional Wisdom Fails to be WiseDownload PDF

16 May 2022 (modified: 05 May 2023)NeurIPS 2022 SubmittedReaders: Everyone
Keywords: neural networks, deep learning, overparameterization, generalization, overfitting.
Abstract: A major tenet of conventional wisdom dictates that models should not be over-parameterized: the number of free parameters should not exceed the number of training data points. This tenet originates from centuries of shallow learning, primarily in the form of linear or logistic regression. It is routinely applied to all kinds of data analyses and modeling and even to infer properties of the brain. However, we show that this conventional wisdom is completely wrong as soon as one moves from shallow to deep learning. In particular, we construct sequences of both linear and non-linear deep learning models whose number of parameters can grow to arbitrarily large values, and which remain well defined and trainable using a fixed, finite size, training set. In deep models, the parameter space is partitioned into large equivalence classes. Learning can be viewed as a communication process where information is communicated from the data to the synaptic weights. The information in the training data only can, and needs to, specify an equivalence class of the parameters. It cannot, and does not need to, specify individual parameter values. As such, the number of training examples can be smaller than the number of free parameters.
7 Replies

Loading