Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery
Keywords: Generalized Category Discovery
Abstract: Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. Recently, Visual-Language Models (VLMs) are employed to learn multi-modality representations for GCD task. Usually the representation learning approaches for multi-modal GCD are depend upon modality alignment. However, there is a lack of sufficient investigation on the underlying structure of distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, which is able to learn cross-modality representations with desired structural properties to align the intra-modality relationships. Moreover, we also integrate semantic information from prompt candidates by leveraging the inter-modal alignment offered by VLMs. Experiments conducted on generic and fine-grained benchmark datasets demonstrate the superior performance of our approach.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 14074
Loading