Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle

Published: 16 Feb 2024, Last Modified: 28 Mar 2024BT@ICLR2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: double descent, interpretability, machine learning
Blogpost Url: https://iclr-blogposts.github.io/2024/blog/double-descent-demystified/
Abstract: Double descent is a surprising phenomenon in machine learning, in which as the number of model parameters grows relative to the number of data, test error drops as models grow ever larger into the highly overparameterized (data undersampled) regime. This drop in test error flies against classical learning theory on overfitting and has arguably underpinned the success of many large models in machine learning. In this work, we analytically dissect the simple setting of ordinary linear regression, and show intuitively and rigorously when and why double descent occurs, without complex tools (e.g., statistical mechanics, random matrix theory). We identify three interpretable factors that, when simultaneously all present, cause double descent: (1) How much the training features vary in each direction; (2) How much, and in which directions, the test features vary relative to the training features; (3) How well the best possible model in the model class can correlate the variance in the training features with the training targets. We demonstrate on real data that ordinary linear regression exhibits double descent, and that double descent disappears when we ablate any one of the three identified factors. We conclude by using our fresh perspective to shed light on recent observations in nonlinear models concerning superposition and double descent.
Ref Papers: https://arxiv.org/abs/1906.11300, https://arxiv.org/abs/1812.11118, https://arxiv.org/abs/1903.07571, https://arxiv.org/abs/1808.00387, https://arxiv.org/abs/1710.03667, https://arxiv.org/abs/1903.08560
Id Of The Authors Of The Papers: ~Mikhail_Belkin1, ~Jeffrey_Pennington1, ~Andrea_Montanari1, ~Song_Mei1
Conflict Of Interest: Conflicts with citations [16, 17, 18], but these are only a small number of citations overall.
Submission Number: 3
Loading