Deep Nets Don't Learn via Memorization

David Krueger*; Nicolas Ballas*; Stanislaw Jastrzebski*; Devansh Arpit*; Maxinder S. Kanwal; Tegan Maharaj; Emmanuel Bengio; Asja Fischer; Aaron Courville

Deep Nets Don't Learn via Memorization

David Krueger, Nicolas Ballas, Stanislaw Jastrzebski, Devansh Arpit, Maxinder S. Kanwal, Tegan Maharaj, Emmanuel Bengio, Asja Fischer, Aaron Courville

06 Jul 2025 (modified: 21 Feb 2017)ICLR 2017Readers: Everyone

Abstract: We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by \textit{memorizing} training data, in spite of overly-expressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs.~natural datasets, showing that: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but \emph{shorter} for random inputs, and (3) DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g.~dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.

TL;DR: Deep Nets Don't Learn via Memorization

Keywords: Deep learning, Optimization

Conflicts: umontreal.ca, polymtl.ca, uj.edu.pl, iai.uni-bonn.de

5 Replies

Loading

Deep Nets Don't Learn via Memorization

David Krueger*, Nicolas Ballas*, Stanislaw Jastrzebski*, Devansh Arpit*, Maxinder S. Kanwal, Tegan Maharaj, Emmanuel Bengio, Asja Fischer, Aaron Courville

David Krueger, Nicolas Ballas, Stanislaw Jastrzebski, Devansh Arpit, Maxinder S. Kanwal, Tegan Maharaj, Emmanuel Bengio, Asja Fischer, Aaron Courville