Position: Not All Explanations for Deep Learning Phenomena Are Equally Valuable

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 Position Paper Track oralEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Deep learning phenomena do offer practical value, but we should carefully consider where exactly that value lies.
Abstract:

Developing a better understanding of surprising or counterintuitive phenomena has constituted a significant portion of deep learning research in recent years. These include double descent, grokking, and the lottery ticket hypothesis -- among many others. Works in this area often develop ad hoc hypotheses attempting to explain these observed phenomena on an isolated, case-by-case basis. This position paper asserts that, in many prominent cases, there is little evidence to suggest that these phenomena appear in real-world applications and these efforts may be inefficient in driving progress in the broader field. Consequently, we argue against viewing them as isolated puzzles that require bespoke resolutions or explanations. However, despite this, we suggest that deep learning phenomena do still offer research value by providing unique settings in which we can refine our broad explanatory theories of more general deep learning principles. This position is reinforced by analyzing the research outcomes of several prominent examples of these phenomena from the recent literature. We revisit the current norms in the research community in approaching these problems and propose practical recommendations for future research, aiming to ensure that progress on deep learning phenomena is well aligned with the ultimate pragmatic goal of progress in the broader field of deep learning.

Lay Summary:

In this paper, we examine the methodology being used to study a particular sub-area of deep learning research that focuses on so-called deep learning phenomena. This area addresses interesting and unusual behavior observed in neural networks that can be isolated and analyzed. Although these phenomena (such as double descent, grokking, and the lottery ticket hypothesis) are heavily studied, we argue that many of them are unlikely to occur in practical applications. As such, treating them as puzzles to be solved on their own may not be the most productive research strategy. Instead, we suggest that their main value lies in how they can help us test and refine broader theories about how deep learning works. We provide examples of how this approach has led to useful insights and propose practical recommendations for making research in this area more aligned with the goals of the wider field.

Primary Area: Research Priorities, Methodology, and Evaluation
Keywords: Deep learning phenomena, double descent, grokking, lottery ticket hypothesis
Submission Number: 347
Loading