Understanding the Forgetting of (Replay-based) Continual Learning via Feature Learning: Angle Matters
Abstract: Continual learning (CL) is crucial for advancing human-level intelligence, but its theoretical understanding, especially regarding factors influencing forgetting, is still relatively limited. This work aims to build a unified theoretical framework for understanding CL using feature learning theory. Different from most existing studies that analyze forgetting under linear regression model or lazy training, we focus on a more practical two-layer convolutional neural network (CNN) with polynomial ReLU activation for sequential tasks within a signal-noise data model. Specifically, we theoretically reveal how the angle between task signal vectors influences forgetting that: *acute or small obtuse angles lead to benign forgetting, whereas larger obtuse angles result in harmful forgetting*. Furthermore, we demonstrate that the replay method alleviates forgetting by expanding the range of angles corresponding to benign forgetting. Our theoretical results suggest that mid-angle sampling, which selects examples with moderate angles to the prototype, can enhance the replay method's ability to mitigate forgetting. Experiments on synthetic and real-world datasets confirm our theoretical results and highlight the effectiveness of our mid-angle sampling strategy.
Lay Summary: How can we help AI systems learn continuously without forgetting what they already know? This challenge, known as catastrophic forgetting, limits AI’s ability to learn like humans. The problem is especially complex in realistic neural networks that learn features from data, and not well understood theoretically. We built a simple yet practical model to study this, using a two-layer neural network that processes tasks one by one. We found that forgetting depends on how similar the tasks are — measured by the angle between their core signals. Tasks that are moderately similar cause less forgetting, while highly different tasks lead to more. Replay methods, which mix past examples during training, help by expanding the range of safe task similarities. Based on this, we propose a smarter replay strategy: mid-angle sampling, which selects examples of moderate similarity. Our theory and experiments show that this method reduces forgetting and improves learning stability. This insight can help design AI systems that better retain knowledge over time.
Primary Area: Theory->Learning Theory
Keywords: Continual learning, Feature learning theory, Replay method
Submission Number: 8610
Loading