Communication-Optimal Distributed Graph Clustering under Duplication ModelsDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Graph Clustering, Distributed Computation, Communication Complexity, Duplication Models
Abstract: We consider the problem of clustering graph nodes over large-scale distributed graphs, when graph edges with possibly edge duplicates are observed distributively. Although edge duplicates across different sites appear to be beneficial at the first glance, in fact they could make the clustering task more complicated since potentially their processing would need extra computations and communications. We propose the first communication-optimal algorithms for two well-established communication models namely the message passing and the blackboard models. Specifically, given a graph on $n$ nodes with edges observed at $s$ sites, our algorithms achieve communication costs $\tilde{O}(ns)$ and $\tilde{O}(n+s)$ ($\tilde{O}$ hides a polylogarithmic factor), which almost match their lower bounds, $\Omega(ns)$ and $\Omega(n+s)$, in the message passing and the blackboard models respectively. The communication costs are asymptotically the same as those under non-duplication models, under a mild assumption on edge distribution. Our algorithms can also guarantee clustering quality nearly as good as that of centralizing all edges and then applying any standard clustering algorithm.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)
4 Replies

Loading