Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

Wei He; Xianghan Meng; Zhiyuan Huang; Xianbiao Qi; Rong Xiao; Chun-Guang Li

Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery

Wei He, Xianghan Meng, Zhiyuan Huang, Xianbiao Qi, Rong Xiao, Chun-Guang Li

18 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generalized Category Discovery

Abstract: Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. Recently, Visual-Language Models (VLMs) are employed to learn multi-modality representations for GCD task. Usually the representation learning approaches for multi-modal GCD are depend upon modality alignment. However, there is a lack of sufficient investigation on the underlying structure of distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, which is able to learn cross-modality representations with desired structural properties to align the intra-modality relationships. Moreover, we also integrate semantic information from prompt candidates by leveraging the inter-modal alignment offered by VLMs. Experiments conducted on generic and fine-grained benchmark datasets demonstrate the superior performance of our approach.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 14074

Loading