TL;DR: We establish the necessary and sufficient conditions for clustering in the LSBM by introducing IAC, the first algorithm to achieve the sufficient condition.
Abstract: In this paper, we investigate the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters whose sizes grow linearly with the total number of nodes. We derive the necessary and sufficient conditions under which the expected number of misclassified nodes is less than $ s $, for any number $ s = o(n) $. To achieve this, we propose IAC (Instance-Adaptive Clustering), the first algorithm whose performance matches the instance-specific lower bounds both in expectation and with high probability.
IAC is a novel two-phase algorithm that consists of a one-shot spectral clustering step followed by iterative likelihood-based cluster assignment improvements. This approach is based on the instance-specific lower bound and notably does not require any knowledge of the model parameters, including the number of clusters. By performing the spectral clustering only once, IAC maintains an overall computational complexity of $ \mathcal{O}(n\, \text{polylog}(n)) $, making it scalable and practical for large-scale problems.
Lay Summary: **(1) Problem:**
Many modern applications—from social networks to biology—seek to uncover “communities” or groups within large, complex networks. Detecting these hidden groups accurately and efficiently, especially as networks grow larger, is a longstanding and challenging problem.
**(2) Solution:**
Our research tackles this challenge by designing a new algorithm, called IAC, which can reveal these groups without knowing any details about the network in advance—such as how many groups there are. IAC works in two main steps: it first takes a quick overall look to find an initial guess of the groups, and then iteratively refines these groupings to improve accuracy, relying on advanced mathematical principles.
**(3) Impact:**
What sets IAC apart is that it not only finds almost all communities correctly, but does so with less computational effort than previous methods, making it suitable for very large datasets. Our work offers new mathematical guarantees for detecting hidden communities and provides a practical tool for analyzing complex networks in a wide range of real-world settings.
Primary Area: General Machine Learning->Clustering
Keywords: Stochastic Block Model, Clustering, Community Detection
Submission Number: 8618
Loading