Keywords: Unsupervised Learning, Class Incremental Learning, Visual-Text Complementary Guidance
TL;DR: This paper proposes a visual-text complementary guidance method to solve the problem of unsupervised class incremental learning
Abstract: Continual learning from unlabeled data streams while effectively combating catastrophic forgetting poses an intractable challenge. Traditional methods predominantly rely on visual clustering techniques to generate pseudo labels, which are frequently plagued by problems such as noise and suboptimal quality, profoundly affecting the impact on the model evolution. To surmount these obstacles, we introduce an innovative approach that synergistically combines both visual and textual information to generate dual space hybrid pseudo labels for reliable model continual evolution. Specifically, by harnessing the capabilities of large multimodal models, we initially generate generalizable text descriptions for a few representative samples. These descriptions then undergo a `Coarse to Fine' refinement process to capture the subtle nuances between different data points, significantly enhancing the semantic accuracy of the descriptions. Simultaneously, a novel cross-modal hybrid approach seamlessly integrates these fine-grained textual descriptions with visual features, thereby creating a more robust and reliable supervisory signal. Finally, such descriptions are employed to alleviate the catastrophic forgetting issue via a semantic alignment distillation, which capitalizes on the stability inherent in language knowledge to effectively prevent the model from forgetting previously learned information. Comprehensive experiments conducted on a variety of benchmarks demonstrate that our proposed method attains state-of-the-art performance, and ablation studies further substantiate the effectiveness and superiority of the proposed method.
Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)
Submission Number: 4870
Loading