DGCBench:  A Deep Graph Clustering Benchmark

Benyu Wu; Yue Liu; Qiaoyu Tan; Xinwang Liu; Wei Du; Jun Wang; Guoxian Yu

DGCBench: A Deep Graph Clustering Benchmark

Benyu Wu, Yue Liu, Qiaoyu Tan, Xinwang Liu, Wei Du, Jun Wang, Guoxian Yu

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: deep graph clustering benchmark, deep clustering, unsupervised learning

TL;DR: We propose the first comprehensive and unified benchmark for deep graph clustering, offer an open-source package named PyDGC, and point out promising research directions for DGC from extensive experimental analyses.

Abstract: Deep graph clustering (DGC) aims to partition graph nodes into distinct clusters in an unsupervised manner. Despite rapid advancements in this field, DGC remains inherently challenging due to the absence of ground-truth, which complicates the design of effective algorithms and impedes the establishment of standardized benchmarks. The lack of unified datasets, evaluation protocols, and metrics further exacerbates these challenges, making it difficult to systematically assess and compare DGC methods. To address these limitations, we introduce $\texttt{DGCBench}$, the first comprehensive and unified benchmark for DGC methods. It evaluates 12 state-of-the-art DGC methods across 12 datasets from diverse domains and scales, spanning 6 critical dimensions: $\textbf{discriminability}$, $\textbf{effectiveness}$, $\textbf{scalability}$, $\textbf{efficiency}$, $\textbf{stability}$, and $\textbf{robustness}$. Additionally, we develop $\texttt{PyDGC}$, an open-source Python library that standardizes the DGC training and evaluation paradigm. Through systematic experiments, we reveal persistent limitations in existing methods, specifically regarding the homophily bottleneck, training instability, vulnerability to perturbations, efficiency plateau, scalability challenges, and poor discriminability, thereby offering actionable insights for future research. We hope that $\texttt{DGCBench}$, $\texttt{PyDGC}$, and our analyses will collectively accelerate the progress in the DGC community. The code is available at https://github.com/Marigoldwu/PyDGC.

Code URL: https://github.com/Marigoldwu/PyDGC

Primary Area: Machine learning approaches to data and benchmarks enrichment, augmentation and processing (supervised, unsupervised, online, active, fine-tuning, RLHF, SFT, alignment, etc.)

Submission Number: 1625

Loading