A Modern Take on the Bias-Variance Tradeoff in Neural Networks

Brady Neal; Sarthak Mittal; Aristide Baratin; Vinayak Tantia; Matthew Scicluna; Simon Lacoste-Julien; Ioannis Mitliagkas

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

Brady Neal, Sarthak Mittal, Aristide Baratin, Vinayak Tantia, Matthew Scicluna, Simon Lacoste-Julien, Ioannis Mitliagkas

Published: 04 Jun 2019, Last Modified: 05 May 2023ICML Deep Phenomena 2019Readers: Everyone

Keywords: bias-variance tradeoff, generalization, deep learning theory, concentration

TL;DR: We provide evidence against classical claims about the bias-variance tradeoff and propose a novel decomposition for variance.

Abstract: Recent empirical results on over-parameterized deep networks are marked by a striking absence of the classic U-shaped test error curve: test error keeps decreasing in wider networks. Researchers are actively working on bridging this discrepancy by proposing better complexity measures. Instead, we directly measure prediction bias and variance for four classification and regression tasks on modern deep networks. We find that both bias and variance can decrease as the number of parameters grows. Qualitatively, the phenomenon persists over a number of gradient-based optimizers. To better understand the role of optimization, we decompose the total variance into variance due to training set sampling and variance due to initialization. Variance due to initialization is significant in the under-parameterized regime. In the over-parameterized regime, total variance is much lower and dominated by variance due to sampling. We provide theoretical analysis in a simplified setting that is consistent with our empirical findings.

1 Reply

Loading