Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Thomas George; Guillaume Lajoie; Aristide Baratin

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Thomas George, Guillaume Lajoie, Aristide Baratin

Published: 21 Jul 2022, Last Modified: 04 May 2025SCIS 2022 PosterReaders: Everyone

TL;DR: We compare lazy vs standard regimes of deep networks through the lens of example difficulty. We show that representation learning hastens towards learning easy examples. This can translate into an enhanced sensitivity to spurious correlations.

Abstract: A recent line of work has identified a so-called ‘lazy regime’ where a deep network can be well approximated by its linearization around initialization throughout training. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations.

Confirmation: Yes

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/lazy-vs-hasty-linearization-in-deep-networks/code)

0 Replies

Loading