Hierarchical Overlapping Clustering on Graphs: Cost Function, Algorithm and Scalability

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We have proposed a cost function for hierarchical overlapping clustering on graphs, and developed an approximation algorithm for it.
Abstract: Overlap and hierarchy are two prevalent phenomena in clustering, and usually coexist in a single system. There are several studies on each of them separately, but it is unclear how to characterize and evaluate the hybrid structures yet. To address this issue, we initiate the study of hierarchical overlapping clustering on graphs by introducing a new cost function for it. We show the rationality of our cost function via several intuitive properties, and develop an approximation algorithm that achieves a constant approximation factor for its dual version. Our algorithm is a recursive process of overlapping bipartition based on local search, which makes a speed-up version of it extremely scalable. Our experiments demonstrate that the speed-up algorithm has significantly better performances than all the baseline methods in both effectiveness and scalability on synthetic and real datasets.
Lay Summary: In a social network, individuals often belong to organizations with layered structures. For example, a person might be part of a research group under a specific department at a university, while also having multiple roles like being a teacher, a father, and the captain of a local amateur soccer team. Social networks connect people through this mix of layered and overlapping group structures, a pattern also seen in many other complex networks or systems. A key question is: how can we represent, measure, and analyze such structures? Our paper pioneers this research direction. We use a mathematical tool called a "directed acyclic graph" (a non-looping node-link structure) to describe these organizational patterns. For any network and its corresponding directed acyclic graph, we developed a method to evaluate how accurately the graph captures the network's layered and overlapping group structures. Additionally, we designed an algorithm that efficiently generates such graphs for any network. This algorithm works even on personal computers, handling networks with tens of thousands of nodes in minutes. We believe this theory is crucial for uncovering the true architecture of complex networks and systems and understanding their evolution. The code for this tool has been made freely available online for public use.
Link To Code: https://github.com/Hardict/HOC
Primary Area: General Machine Learning->Clustering
Keywords: hierarchical overlapping clustering, approximation algorithm, scalability
Submission Number: 3597
Loading