Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

Yaowenhu; Wenxuan Tu; Yue Liu; Xinhang Wan; Junyi Yan; Taichun Zhou; Xinwang Liu

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

Yaowenhu, Wenxuan Tu, Yue Liu, Xinhang Wan, Junyi Yan, Taichun Zhou, Xinwang Liu

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We address large-scale deep graph clustering under missing attributes and bridge the gap between Multi-View Clustering and Deep Graph Clustering.

Abstract: Deep graph clustering (DGC), which aims to unsupervisedly separate the nodes in an attribute graph into different clusters, has seen substantial potential in various industrial scenarios like community detection and recommendation. However, the real-world attribute graphs, e.g., social networks interactions, are usually large-scale and attribute-missing. To solve these two problems, we propose a novel DGC method termed **C**omplementary **M**ulti-**V**iew **N**eighborhood **D**ifferentiation ($\textit{CMV-ND}$), which preprocesses graph structural information into multiple views in a complete but non-redundant manner. First, to ensure completeness of the structural information, we propose a recursive neighborhood search that recursively explores the local structure of the graph by completely expanding node neighborhoods across different hop distances. Second, to eliminate the redundancy between neighborhoods at different hops, we introduce a neighborhood differential strategy that ensures no overlapping nodes between the differential hop representations. Then, we construct $K+1$ complementary views from the $K$ differential hop representations and the features of the target node. Last, we apply existing multi-view clustering or DGC methods to the views. Experimental results on six widely used graph datasets demonstrate that CMV-ND significantly improves the performance of various methods.

Lay Summary: Real-world graphs, such as those underlying social media and e-commerce platforms, are often massive and incomplete because many nodes lack attribute information. Grouping similar nodes in such graphs is essential for tasks like community detection and personalized recommendation. However, this becomes highly challenging when the graphs are too large or the attribute data is missing. We propose a new method that captures more comprehensive structural information for each node by examining multiple neighborhood levels—for example, direct neighbors, second-hop neighbors, and beyond. Each resulting “view” offers unique and complementary insights, contributing to a richer and more informative node representation. To avoid redundancy, we retain only the distinct information at each level, similar to peeling an onion to reveal non-overlapping layers. This layered perspective helps clustering algorithms more effectively identify structural patterns among nodes. Evaluated on six real-world datasets, our method consistently outperforms existing approaches, demonstrating its effectiveness in analyzing large-scale, incomplete graphs.

Primary Area: General Machine Learning->Clustering

Keywords: Deep Graph Clustering，Multi-view clustering，Graph Neighborhood Difference，Information Redundancy

Submission Number: 1584

Loading