Keywords: Continual Learning
TL;DR: We introduce CoLaP, a language-guided prompt selection framework that leverages multimodal LLMs to improve generalization and robustness in continual learning by mitigating reliance on biased visual encoders.
Abstract: Continual learning (CL) aims to enable models to learn a sequence of new tasks without forgetting previously acquired knowledge. Prompt-based approaches, which adapt small prompt parameters while keeping a large pre-trained backbone frozen, have become a popular strategy to reduce forgetting. However, most existing methods rely solely on visual encoders to effectively guide prompt selection, which leaves them vulnerable to distribution shifts, because biased visual representations can misidentify prompts and lead to severe forgetting. We propose CoLaP, a language-guided prompt selection framework that leverages multimodal models to address this limitation. During training, each input is converted into a rich textual description that provides semantic guidance for training the visual prompt selector. The prompt pool is constructed from clustered concepts that are unique to each dataset, reflecting its specific distribution. In inference, the learned visual selector operates purely on images, preserving efficiency while maintaining the balance between plasticity and stability. Extensive experiments on both in-distribution and out-of-distribution benchmarks show that purely visual prompt methods degrade as the number of tasks grows, whereas our language-informed approach achieves superior generalization and robustness. These results highlight the promise of multimodal semantic guidance for scalable and resilient continual learning.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 2763
Loading