Deep Nets Don't Learn via Memorization

David Krueger*, Nicolas Ballas*, Stanislaw Jastrzebski*, Devansh Arpit*, Maxinder S. Kanwal, Tegan Maharaj, Emmanuel Bengio, Asja Fischer, Aaron Courville

Feb 17, 2017 (modified: Feb 21, 2017) ICLR 2017 workshop submission readers: everyone
  • Abstract: We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by \textit{memorizing} training data, in spite of overly-expressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs.~natural datasets, showing that: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but \emph{shorter} for random inputs, and (3) DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g.~dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.
  • TL;DR: Deep Nets Don't Learn via Memorization
  • Authorids:
  • Conflicts:,,,
  • Keywords: Deep learning, Optimization