Text-guided Group Mixup with Canonical Mining for Imbalanced Graph Clustering

Yiwei Fu; Chunchun Chen; Yuxing Zhang; Weizhi Zhang; Guanjie Zheng; Wei Ye; Xiao Luo; Philip S. Yu; Minghua Deng

Text-guided Group Mixup with Canonical Mining for Imbalanced Graph Clustering

Yiwei Fu, Chunchun Chen, Yuxing Zhang, Weizhi Zhang, Guanjie Zheng, Wei Ye, Xiao Luo, Philip S. Yu, Minghua Deng

18 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imbalanced graph clustering; Large language models; Canonical correlation alignment; Minority-aware mixup representation learning

TL;DR: This paper presents a novel framework TRACI for imbalanced text-attributed graph clustering, which leverages large language models to generate balanced, mixed groups with an emphasis on minority classes.

Abstract: Graph neural networks (GNNs) have achieved remarkable progress in text-attributed graph clustering. However, these approaches assume that different classes are uniformly distributed, which impedes their practicability in real-world, imbalanced scenarios. Towards this end, this paper studies the problem of imbalanced text-attributed graph clustering, and proposes a novel framework named Text-guided Group Mixup with Canonical Mining (TRACI) for the problem. The core of our TRACI lies in generating mixed groups with an emphasis on minority classes, guided by large language models (LLMs). In particular, we first utilize LLMs to produce various views for each sample and randomly assign samples into balanced groups with mixed semantics for consistency learning. To further enhance robustness, we employ LLMs to compute correlation scores among samples with respect to the synthesized groups, thereby reinforcing minority-aware group representations. In addition, we encourage canonical correlations between various augmented views of nodes to ensure semantic alignment. Extensive experiments on several benchmark datasets validate the effectiveness of the proposed TRACI, demonstrating clear advantages over state-of-the-art baselines under class-imbalanced conditions. The source code is available at https://anonymous.4open.science/r/TRACI-E087.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 12084

Loading