Abstract: Large language models (LLMs) incur catastrophic forgetting of previous tasks when they overfit new tasks sequentially. Existing continual learning (CL) methods often require task-specific memory, training paradigm continuity, or architecture expansion. To minimize privacy, accessibility, overheads, and other practical concerns, this paper addresses a strict CL setting where only the latest model and data are available and model capacity is fixed. We propose Gradient Spectrum Rescaling (GSR), a memory-free, plug-and-play, and in-place CL
approach that prioritizes under-utilized directions to mitigate forgetting of learned important knowledge. Specifically, GSR adaptively rescales the singular components of gradients based on layerwise singular value decomposition (SVD). Experiments on 5 text generation tasks demonstrate the forgetting mitigation ability and performance of GSR.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: continual learning, fine-tuning, generative models
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, Chinese, Akkadian, Sumerian
Submission Number: 8427
Loading