Exploring the Limits of Feature Learning in Continual Learning

Published: 10 Oct 2024, Last Modified: 25 Oct 2024Continual FoMo PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: feature learning, parameterization, scaling limits, deep learning, machine learning, continual learning, catastrophic forgetting, forgetting, continual, ntk, ntp, muP
TL;DR: We perform an empirical study on the role of feature learning and scale on catastrophic forgetting by applying the precepts of the theory on neural networks scaling limits.
Abstract: Despite the recent breakthroughs in deep learning, neural networks still struggle to learn continually in non-stationary environments, and the reasons are poorly understood. In this work, we perform an empirical study on the role of feature learning and scale on catastrophic forgetting by applying the precepts of the theory on neural networks scaling limits. We interpolate between lazy and rich training regimes, finding that the optimal amount of feature learning is modulated by task similarity. Surprisingly, our results consistently show that more feature learning increases catastrophic forgetting and that scale only helps when yielding more laziness. Supported by empirical evidence on a variety of benchmarks, our work provides the first unified understanding of the role of scale in the different training regimes and parameterizations for continual learning.
Submission Number: 4
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview