Measuring Representational Shifts in Continual Learning: A Linear Transformation Perspective

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce a new metric, representation discrepancy, to track how internal knowledge of an AI model changes over time. Using this, we found that forgetting occurs faster in deeper layers of the network and that wider networks tend to forget less.
Abstract: In continual learning scenarios, catastrophic forgetting of previously learned tasks is a critical issue, making it essential to effectively measure such forgetting. Recently, there has been growing interest in focusing on representation forgetting, the forgetting measured at the hidden layer. In this paper, we provide the first theoretical analysis of representation forgetting and use this analysis to better understand the behavior of continual learning. First, we introduce a new metric called representation discrepancy, which measures the difference between representation spaces constructed by two snapshots of a model trained through continual learning. We demonstrate that our proposed metric serves as an effective surrogate for the representation forgetting while remaining analytically tractable. Second, through mathematical analysis of our metric, we derive several key findings about the dynamics of representation forgetting: the forgetting occurs more rapidly to a higher degree as the layer index increases, while increasing the width of the network slows down the forgetting process. Third, we support our theoretical findings through experiments on real image datasets, including Split-CIFAR100 and ImageNet1K.
Lay Summary: Artificial intelligence (AI) models often struggle to retain past knowledge when they are trained on new tasks one after another—a problem known as catastrophic forgetting. While recent studies have shown that this forgetting happens not just in the model's output but deep within its internal representations, understanding this process mathematically has remained a challenge. Our research introduces a new, easy-to-analyze metric called representation discrepancy to track how much the internal knowledge of an AI model changes over time. This allows us to study the model’s “memory” more precisely. Using this metric, we found that forgetting happens faster in deeper layers of the network and that wider networks tend to forget less. Our work offers new insights into how AI models learn—and forget—over time. This understanding is a key step toward building smarter, more reliable AI systems that can learn continuously without losing what they’ve already mastered.
Primary Area: Theory->Deep Learning
Keywords: Continual Learning, Representation Learning, Representation Forgetting
Submission Number: 2422
Loading