Abstract: Understanding the remarkable generalization abilities of Deep Learning systems remains one of the significant scientific challenges of our time. It is widely accepted that the success of DNNs stems, at least partially, from having many hidden layers. However, the benefits of such depth are not universal. We introduce a simple experimental paradigm that demonstrates the contrasts between CNNs and MLPs in this respect. This paradigm demonstrates the ability of contemporary architectures to leverage deep, multi-layered structures to systematically improve model generalization ability. However, this conflicts with statistical learning theory (SLT) and its key concept of the bias-variance tradeoff. Therefore, we present an alternative framework to understand the relationship between network architecture and generalization by viewing classifiers as maps between different metric spaces. Through comparative analysis, we uncover how deeper networks develop a bias towards smoother input representations; and that the inductive bias responsible for superior generalization in deep CNNs is distinct from the standard “minimal complexity” (Occam’s razor) that is the focus of SLT.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Gintare_Karolina_Dziugaite1
Submission Number: 3849
Loading