Keywords: Continual learning, Class incremental learning, CLIP, Long-Tailed Recognition
Abstract: Pre-trained vision–language models such as CLIP provide strong priors for class-incremental learning (CIL), yet existing methods degrade sharply in long-tailed scenarios.
We demonstrate that CLIP, with only a single lightweight adapter, is sufficient to handle this setting when the CIL process is structured into Intra-Task Stabilization, Inter-Task Preservation, and Knowledge Consolidation.
In Intra-Task Stabilization, we ground the training of current tail classes through Two-Stage Hybrid Augmentation, which anchors learning on CLIP’s text knowledge and refines it with distribution-aware signals. In Inter-Task Preservation, we protect past knowledge with Tail-Aware Semantic Shrinkage, which corrects biased statistics using semantically related head classes, and Adaptive Margin Hinge Loss, which maintains robust boundaries between old and new classes. Finally, in Knowledge Consolidation, Mode-Connectivity Spherical Merge integrates task-specific adapters along a low-loss path, ensuring stable unification into a single model. By explicitly linking Intra-Task Stabilization, Inter-Task Preservation, and Knowledge Consolidation, our framework delivers a coherent solution for long-tailed CIL. Experiments on ImageNetSubset with increasing class numbers show consistent improvements over prior CLIP-based methods, with margins that grow under more severe long-tail conditions.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 8543
Loading