TL;DR: In this work, we show that in order to obtain a better-than-2 approximation for Constrained Correlation Clustering, solving the (exponentially large) Constrained Cluster LP would be sufficient.
Abstract: In the Correlation Clustering problem, we are given an undirected graph and are tasked with computing a clustering (partition of the nodes) that minimizes the sum of the number of edges across different clusters and the number of non-edges within clusters. In the constrained version of this problem, the goal is to compute a clustering that satisfies additional hard constraints mandating certain pairs to be in the same cluster and certain pairs to be in different clusters. Constrained Correlation Clustering is APX-Hard, and the best known approximation factor is 3 (van Zuylen et al. [SODA '07]). In this work, we show that in order to obtain a better-than-2 approximation, solving the (exponentially large) Constrained Cluster LP would be sufficient.
[The peer-reviewed version of this article claimed an efficient algorithm for solving the Constrained Cluster LP. An error in the proof, that the authors discovered after the review process, led them to revise the results to be conditional on the existence of a valid LP solution.]
Lay Summary: When teaching machines to learn from data without explicit supervision—a setting known as unsupervised learning—a central objective is to group data into clusters based on inherent similarities. In many practical scenarios, however, incorporating expert knowledge can significantly improve clustering quality, giving rise to semi-supervised learning.
This paper addresses a semi-supervised learning problem known as Constrained Correlation Clustering. The setting involves a set of elements, along with expert-provided similarity judgments for certain pairs—these are considered reliable. For the remaining pairs, we have less reliable estimates of similarity.
The goal is to cluster the elements such that:
- Pairs deemed dissimilar by the expert lie in different clusters (cannot-link),
- Pairs deemed similar by the expert do not end up in different clusters (must-link),
- The number of violated similarity estimates is minimized.
In this paper, we draw connections between two fundamental techniques from the literature, paving the way for improved and conceptually simple algorithms.
Primary Area: Theory
Keywords: Clustering, Constrained Correlation Clustering, Approximation
Submission Number: 12312
Loading