Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficultyDownload PDF

28 May 2022, 15:02 (modified: 21 Jul 2022, 01:30)SCIS 2022 PosterReaders: Everyone
TL;DR: We compare lazy vs standard regimes of deep networks through the lens of example difficulty. We show that representation learning hastens towards learning easy examples. This can translate into an enhanced sensitivity to spurious correlations.
Abstract: A recent line of work has identified a so-called ‘lazy regime’ where a deep network can be well approximated by its linearization around initialization throughout training. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations.
Confirmation: Yes
0 Replies