TL;DR: We have proposed a cost function for hierarchical overlapping clustering on graphs, and developed an approximation algorithm for it.
Abstract: Hierarchical and overlapping clustering are two prevalent phenomena that often coexist in real-world system. While numerous studies have examined these two structures separately, characterizing and evaluating their hybrid forms remains an open challenge. To bridge this gap, we initiate the study of hierarchical overlapping clustering on graphs by introducing a new cost function and establishing its rationality through several intuitive properties. We further develop an approximation algorithm that achieves a constant approximation factor for its dual version. Our approach employs a recursive overlapping bipartition framework based on local search, enabling a highly scalable speed-up variant. Experimental results demonstrate that this speed-up algorithm outperforms all baseline methods significantly in both effectiveness (across synthetic and real datasets) and scalability.
Lay Summary: In a social network, individuals often belong to organizations with layered structures. For example, a person might be part of a research group under a specific department at a university, while also having multiple roles like being a teacher, a father, and the captain of a local amateur soccer team. Social networks connect people through this mix of layered and overlapping group structures, a pattern also seen in many other complex networks or systems. A key question is: how can we represent, measure, and analyze such structures?
Our paper pioneers this research direction. We use a mathematical tool called a "directed acyclic graph" (a non-looping node-link structure) to describe these organizational patterns. For any network and its corresponding directed acyclic graph, we developed a method to evaluate how accurately the graph captures the network's layered and overlapping group structures. Additionally, we designed an algorithm that efficiently generates such graphs for any network. This algorithm works even on personal computers, handling networks with tens of thousands of nodes in minutes.
We believe this theory is crucial for uncovering the true architecture of complex networks and systems and understanding their evolution. The code for this tool has been made freely available online for public use.
Link To Code: https://github.com/Hardict/HOC
Primary Area: General Machine Learning->Clustering
Keywords: hierarchical overlapping clustering, approximation algorithm, scalability
Submission Number: 3597
Loading