Abstract: Foundational vision-language models (VLMs) excel across diverse tasks, but adapting them to new domains without forgetting prior knowledge remains a critical challenge. Continual Learning (CL) addresses this challenge by enabling models to learn sequentially from new data while mitigating the forgetting of prior information, typically under supervised settings involving label shift. Nonetheless, abrupt distribution shifts can still cause substantial forgetting, potentially nullifying the benefits of supervised updates, especially when storing or replaying past data is infeasible. In this work, we propose leveraging unlabeled test-time data in an unsupervised manner to reinforce prior task performance without requiring replay or stored examples. Unlike traditional Test-Time Adaptation (TTA), which primarily focuses on domain shift or corruption, our method improves performance on earlier tasks by exploiting representative test samples encountered during deployment. We introduce a simple teacher-student framework with gradient-based sparse parameter updates, and show that it effectively mitigates forgetting in class-incremental CL for VLMs, offering a memory-free alternative to episodic replay with strong empirical results.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Changes in the introduction, softening our claim to be the first ones to utilise test time data. (in red for reviewer bpwk
- Addition of citation for "Adaptive Retention & Correction: Test-Time Training for Continual Learning" by Chen et al, ICLR 2025 and contrasting it with our work. (highlighted in red for reviewer bpwk)
- Modified Section 3.1 about sparsity of parameters (highlighted in blue, for reviewer 5vju)
- Modified the conclusion to place generative tasks for future scope (highlighted in blue for reviewer 5vju and QmV7)
- Expanded Appendix 8.4 limitations of DoSAPP (highlighted in dark pink, for reviewer QMv7)
- Added Appendix 8.7 Forgetting in Long Sequence scenario with domain shift (highlighted in dark pink, for reviewer QMv7)
- Added datasets for sparsity threshold in Appendix 8.9
- Added explanation for teacher logits in Appendix 8.10 (highlighted in dark pink, for reviewer QMv7)
- Softened our emphasis on computation time reduction in Appendix 8.13 (highlighted in blue for reviewer 5vju)
Assigned Action Editor: ~Andrew_Kyle_Lampinen1
Submission Number: 6299
Loading