Keywords: Correlation Clustering, Structural Balance, Property Testing
Abstract: Correlation clustering is an important unsupervised learning problem with broad applications. In this problem, we are given a labeled complete graph $G=(V,E^+ \cup E^-)$, and the optimal clustering is defined as a partition of the vertices that minimizes the $+$ edges between clusters and $-$ edges within clusters. We investigate efficient algorithms to test the \emph{cost} of correlation clustering: here, we want to know whether the graph could be (nearly) perfectly clustered (with $0$ cost) or is far away from admitting any perfect clustering. The problem has attracted significant attention aimed at modern large-scale applications, and the state-of-the-art results use $\widetilde{O}({1}/{\varepsilon^7})$ queries and time (up to log factors) to decide whether a graph is perfectly clusterable or needs to flip labels of $\varepsilon {\binom n 2}$ edges to become clusterable. In this paper, we improve this bound significantly by designing an algorithm that uses ${O}({1}/{\varepsilon^2})$ queries and time. Furthermore, we derive the first algorithm that tests the cost for the special setting of correlation clustering with $k$ clusters with ${O}(1/{\varepsilon^4})$ queries and time for constant $k$. Finally, for the special case of $k=2$, which corresponds to the strong structure balance problem in social networks, we obtain tight bounds of $\Theta({1}/{\varepsilon})$ queries -- the first set of \emph{tight} bounds in these problems. We conduct experiments on simulated and real-world datasets, and empirical results demonstrate the advantages of our algorithms.
Primary Area: learning theory
Submission Number: 21003
Loading