TL;DR: We compare lazy vs standard regimes of deep networks through the lens of example difficulty. We show that representation learning hastens towards learning easy examples. This can translate into an enhanced sensitivity to spurious correlations.
Abstract: A recent line of work has identified a so-called ‘lazy regime’ where a deep network can be well approximated by its linearization around initialization throughout training. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations.