Abstract: Few-Shot Class-Incremental Learning has shown remarkable efficacy in efficient learning new concepts with limited annotations. Nevertheless, the heuristic few-shot annotations may not always cover the most informative samples, which largely restricts the capability of incremental learner. We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning. Based on this purpose, this paper introduces the Active Class-Incremental Learning (ACIL). The objective of ACIL is to select the most informative samples from the unlabeled pool to effectively train an incremental learner, aiming to maximize the performance of the resulting model. Note that vanilla active learning algorithms suffer from class-imbalanced distribution among annotated samples, which restricts the ability of incremental learning. To achieve both class balance and informativeness in chosen samples, we propose $\textbf{C}$lass-$\textbf{B}$alanced $\textbf{S}$election ($\textbf{CBS}$) strategy. Specifically, we first cluster the features of all unlabeled images into multiple groups. Then for each cluster, we employ greedy selection strategy to ensure that the Gaussian distribution of the sampled features closely matches the Gaussian distribution of all unlabeled features within the cluster. Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique. Extensive experiments under ACIL protocol across five diverse datasets demonstrate that CBS outperforms both random selection and other SOTA active learning approaches.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: Our submission to ACM MM introduces a novel Active Class-Incremental Learning (ACIL) method, initially applied to image classification tasks utilizing multimodal models designed for image datasets. The core innovation of our approach lies not only in its application to image data but also in its underlying multimodal framework (i.e., CLIP), which enables it to be inherently adaptable to various other modalities. While our current research showcases the application of ACIL within the context of image classification, the principles and techniques we have developed are designed to be universally applicable, making it straightforward to extend these methodologies to tasks involving other singular or combined modalities.
By presenting our methodology and its successful application in a multimodal context, we aim to encourage the multimedia research community to explore further the integration of such approaches into their work, especially in scenarios that necessitate active and continuous learning. This proposition aligns with the ACM MM Conference's goal of fostering innovation in multimedia technology, providing a practical framework that addresses the challenges associated with active and incremental learning across diverse data types and modalities. Through this contribution, we invite researchers and practitioners alike to consider the implications of our findings and methodologies for enhancing the capabilities of multimodal learning systems in various applications.
Supplementary Material: zip
Submission Number: 497
Loading