Towards Understanding Parametric Generalized Category Discovery on Graphs

Bowen Deng; Lele Fu; Jialong Chen; Sheng Huang; Tianchi Liao; Zhang Tao; Chuan Chen

Towards Understanding Parametric Generalized Category Discovery on Graphs

Bowen Deng, Lele Fu, Jialong Chen, Sheng Huang, Tianchi Liao, Zhang Tao, Chuan Chen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generalized Category Discovery (GCD) aims to identify both known and novel categories in unlabeled data by leveraging knowledge from old classes. However, existing methods are limited to non-graph data; lack theoretical foundations to answer *When and how known classes can help GCD*. We introduce the Graph GCD task; provide the first rigorous theoretical analysis of *parametric GCD*. By quantifying the relationship between old and new classes in the embedding space using the Wasserstein distance W, we derive the first provable GCD loss bound based on W. This analysis highlights two necessary conditions for effective GCD. However, we uncover, through a Pairwise Markov Random Field perspective, that popular graph contrastive learning (GCL) methods inherently violate these conditions. To address this limitation, we propose SWIRL, a novel GCL method for GCD. Experimental results validate our (theoretical) findings and demonstrate SWIRL's effectiveness.

Lay Summary: In real-world networks like social media or recommendation systems, new types of users or items can appear that AI models have never seen before. We study how AI can automatically discover both known and unknown categories in such networks. We analyze how knowledge from known categories helps recognize new ones and show that many current methods are not well-suited for this task. Based on our findings, we develop a new method called SWIRL , which improves the ability of AI to identify unknown categories in complex network data.

Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning

Keywords: Generalized Category Discovery, Graph Machine Learning, Representation Learning

Submission Number: 6113

Loading