Abstract: This paper studies the unsupervised clustering of large graphs generated from the heterogeneous Stochastic Block Model. We present a sketch-based community detection algorithm, which substantially reduces computational complexity by clustering only a small set of nodes sampled from the full graph followed by a retrieval algorithm. We first show cases where existing algorithms exhibit reduced error rates when all nodes possess the same average number of intra-cluster connections. This behavior is demonstrated for both convex-optimization-based and spectral algorithms. Based on this insight, we develop SPIN, a degree-based sampling method to produce sketches with cluster proportions more favorable for successful clustering. By sampling nodes inversely proportional to their degrees, SPIN can exploit this reduction in error to significantly improve the phase transition as compared to full graph clustering.
0 Replies
Loading