Position: Deep Learning is Not So Mysterious or Different

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 Position Paper Track spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. This position paper argues that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized, using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.
Lay Summary: Machine learning means algorithmic learning by example. In order to learn by example, we need to make assumptions. These assumptions can be represented in terms of restricting the possible explanations for our observations. Alternatively, they can be represented as having soft preferences for certain types of explanations over others. In this paper, we show that the latter way of representing assumptions can be used to intuitively explain many phenomena that are considered mysterious in deep learning. These phenomena include the success of (1) having more parameters than datapoints, (2) predictive accuracy that increases, decreases, and then increases again with increases in model flexibility, and (3) the ability to fit both noise and signal, but still make accurate predictions on withheld data. We show that classical statistical models also exhibit these phenomena, which can be explained in the same way. Moreover, in contrast to many claims, this behaviour can be understood with rigorous theoretical frameworks that have existed for many decades, even when considering the solutions that neural networks reach, rather than analogies given by simple linear models that are easier to analyze. So deep learning is not so mysterious or different. However, we also consider other ways in which deep learning is genuinely different, including its relatively singular ability to work well in many different settings at once.
Primary Area: Model Understanding, Explainability, Interpretability, and Trust
Keywords: Understanding, Generalization
Submission Number: 135
Loading