TL;DR: We study densely connected clusters in graphs and introduce two sparsification algorithms that preserve the structure of these clusters in both undirected graphs and directed ones.
Abstract: Graph clustering is an important algorithmic technique for analysing massive graphs, and has been widely applied in many research fields of data science. While the objective of most graph clustering algorithms is to find a vertex set of low conductance, there has been a sequence of recent studies that highlight the importance of the inter-connection between clusters when analysing real-world datasets. Following this line of research, in this work we study bipartite-like clusters and present efficient and online algorithms that find such clusters in both undirected graphs and directed ones. We conduct experimental studies on both synthetic and real-world datasets, and show that our algorithms significantly speedup the running time of existing clustering algorithms while preserving their effectiveness.
Lay Summary: We introduce two sparsification algorithms designed to find bipartite-like clusters in both undirected and directed graphs. These cluster structures are prevalent in real-world networks like trade, migration, and communication. Unlike traditional methods that focus on within-group interactions, our approach highlights connections between groups. Our algorithms create sparsifiers that significantly reduce graph size while preserving these essential bipartite structures. For undirected graphs, we guarantee the presence of $k$ bipartite clusters by maintaining the $k$-way dual Cheeger constant, a guarantee we extend to directed graphs. These algorithms run in nearly-linear time, rely purely on random sampling, and are simple to implement in an online setting. Our experimental results, spanning both real and synthetic datasets, demonstrate significant speed improvements compared to current methods, all while maintaining clustering accuracy.
Primary Area: General Machine Learning->Clustering
Keywords: bipartite-like components, sparsification, online algorithms
Submission Number: 11820
Loading