Looking at Deep Learning Phenomena Through a Telescoping Lens

Alan Jeffares; Alicia Curth; Mihaela van der Schaar

Looking at Deep Learning Phenomena Through a Telescoping Lens

Alan Jeffares, Alicia Curth, Mihaela van der Schaar

Published: 16 Jun 2024, Last Modified: 19 Jul 2024HiLD at ICML 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Empirical theory, double descent, grokking, gradient boosting

TL;DR: We investigate the utility of a telescoping model for neural network learning, consisting of a sequence of linear approximations, as a tool for empirical study of deep learning phenomena.

Abstract: Deep learning sometimes appears to work in unexpected ways. In pursuit of deeper understanding of its surprising behaviors, we investigate the utility of a tractable and accurate model of a neural network consisting of a sequence of first-order approximations _telescoping_ out into a single empirically operational tool for practical analysis. We illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena in the literature -- including double descent, grokking, and the challenges of applying deep learning on tabular data.

Student Paper: Yes

Submission Number: 27

Loading