Few-Shot Class-Incremental Learning via Cross-Modal Alignment with Feature Replay

Published: 01 Jan 2024, Last Modified: 21 Jul 2025PRCV (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Few-shot class-incremental learning (FSCIL) studies the problem of continually learning novel concepts from a limited training data without catastrophically forgetting the old ones at the meantime. While most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-trained Vision-Language Models (VLMs) within the FSCIL solution, considering that these models have shown powerful generalization abilities in zero-shot/few-shot learning. In this paper, we propose a simple yet effective FSCIL framework that well leverages the prior knowledge of the CLIP model to attack the stability-plasticity dilemma. Considering the semantic gap between the pre-trained and downstream data, we first combine soft prompts with visual adaptation to effectively accommodate the prior knowledge from both branches. Then, we condition the textual prototype on each visual input to adaptively capture the instance-specific information, taking account of their intrinsic heterogeneous structures. On top of this framework, we employ a simple feature replay strategy that models each class as a Gaussian distribution to alleviate the task interference in each new session. Extensive experimental results on three benchmarks, i.e., CIFAR100, CUB200 and miniImageNet, show that our proposed method can achieve compelling FSCIL results.
Loading