Signed Laplacians for Constrained Graph Clustering

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: we propose an algorithm that solves the constrained clustering problem with the signed Laplacians
Abstract: Given two weighted graphs $G = (V, E, w_G)$ and $H = (V, F, w_H)$ defined on the same vertex set, the constrained clustering problem seeks to find a subset $S \subset V$ that minimises the cut ratio between $w_G(S, V \setminus S)$ and $w_H(S, V \setminus S)$. In this work, we establish a Cheeger-type inequality that relates the solution of the constrained clustering problem to the spectral properties of $ G$ and $H$. To reduce computational complexity, we utilise the signed Laplacian of $H$, streamlining calculations while maintaining accuracy. By solving a generalised eigenvalue problem, our proposed algorithm achieves notable performance improvements, particularly in challenging scenarios where traditional spectral clustering methods struggle. We demonstrate its practical effectiveness through experiments on both synthetic and real-world datasets.
Lay Summary: People naturally group things: we put clothes in closets, friends into social circles, and photos into albums. Computers also need to group things, for example, to organise users on social networks, sort weather stations by climate, or recommend products based on customer behaviour. This process is called clustering. However, sometimes we have extra information that should guide how things are grouped. For instance, we might know that two weather stations are in the same region and must be grouped together (this is a must-link). Or we might know that two stations are in very different climates and should not be grouped together (this is a cannot-link). Traditional algorithms do not use this kind of guidance. Our research developed a new mathematical method that allows the computer to follow these human-like rules. We use a tool called a signed Laplacian, which helps balance the natural structure of the data with the extra must-link and cannot-link rules. Our algorithm is not only more accurate, but also faster than existing approaches. This helps computers mimic how humans group things, by seeing patterns, but also by respecting rules. It can improve applications in climate science, public health, education, and areas where both data and expert knowledge matter.
Primary Area: General Machine Learning->Clustering
Keywords: signed Laplacians, constrained clustering, Cheeger inequality
Submission Number: 11451
Loading