- Abstract: In the last few years, deep learning has been tremendously successful in many applications. However, our theoretical understanding of deep learning, and thus the ability of providing principled improvements, seems to lag behind. A theoretical puzzle concerns the ability of deep networks to predict well despite their intriguing apparent lack of generalization: their classification accuracy on the training set is not a proxy for their performance on a test set. How is it possible that training performance is independent of testing performance? Do indeed deep networks require a drastically new theory of generalization? Or are there measurements based on the training data that are predictive of the network performance on future data? Here we show that when performance is measured appropriately, the training performance is in fact predictive of expected performance, consistently with classical machine learning theory.
- Keywords: deep learning, theory, generalization, cross-entropy loss, overfitting
- TL;DR: Contrary to previous beliefs, the training performance of deep networks, when measured appropriately, is predictive of test performance, consistent with classical machine learning theory.