Local Clustering over Labeled Graphs: An Index-Free Approach

Yudong Niu, Yuchen Li, Ju Fan, Zhifeng Bao

2022 (modified: 31 Jan 2023)ICDE 2022Readers: Everyone

Abstract: In this paper, we study local clustering over labeled graphs, which extracts a subgraph with nodes having high label density matched to the query labels as well as high structure density around a seed node. Despite the progress made in the last few years, we observe two major limitations of existing methods: (I) The candidate subgraphs have to comply with strict topology-driven models and better candidates can be pruned by these topological constraints; (II) The topological constraints give rise to substantial computational overheads and existing works have to construct prohibitively large indexes for online processing. To mitigate these limitations, we explore the idea of using conductance in local clustering that ensures structure density through minimizing conductance. Conductance is a well-understood metric primarily for detecting unlabeled clusters but for labeled graphs, applying conductance directly is insufficient because the label information is not taken into consideration. To this end, we propose a novel Label-Aware Motif weighted framework (LAM) to transform the labeled graph to a weighted graph so that both the label and the structure proximity of nodes are captured. We define label-aware motifs as small high-order structures of nodes with query labels. Nodes within a label-aware motif are both closely connected and relevant to query labels, which ease the process of identifying labeled clusters. Our theoretical study shows that LAM is able to better distinguish the desired candidates under the personalized pagerank distribution from the seed node on random graphs generated by the stochastic block model. Based on such nice properties of LAM, we propose an index-free peeling algorithm to efficiently search local clusters on labeled graphs. Extensive experiments on both real-world and synthetic networks show that our proposed algorithm can achieve up to 90% relative effectiveness improvements (F1 scores), while using 10 times less memory than the SOTA algorithm.

0 Replies