Parallelization of butterfly counting on hierarchical memory

Published: 01 Jan 2024, Last Modified: 17 Apr 2025VLDB J. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to do butterfly counting due to the super-linear time complexity and large memory consumption. In this paper, we study I/O-efficient algorithms for doing butterfly counting on hierarchical memory. Existing algorithms of this kind cannot guarantee I/O optimality. Observing that in order to count butterflies, it suffices to “witness” a subgraph instead of the whole structure, a new class of algorithms called semi-witnessing algorithm is proposed. We prove that a semi-witnessing algorithm is not restricted by the lower bound \(\varOmega (\frac{|E|^2}{MB})\) of a witnessing algorithm, and give a new bound of \(\varOmega (\min (\frac{|E|^2}{MB}, \frac{|E||V|}{\sqrt{M}B}))\). Subsequently, we develop the \(\textsf{IOBufs}\) algorithm that manages to approach the I/O lower bound, and thus claim its optimality. Finally, we investigate the parallelization of \(\textsf{IOBufs}\) to improve its performance and scalability. To support various hardware configurations, we introduce a general parallel framework, \(\textsf{PIOBufs}\). Our analysis indicates that the key to implementing \(\textsf{PIOBufs}\) on multi-core CPUs lies in the fine-grained task division. Furthermore, we extend the CPU-tailored \(\textsf{PIOBufs}\) to harness the extensive parallelism that GPUs provide. Our experimental results show that \(\textsf{IOBufs}\) performs better than established algorithms such as \(\textsf{EMRC}\), \(\textsf{BFC}\)-\(\textsf{EM}\) and \(\textsf{G}\)-\(\textsf{BFC}\). Thanks to its I/O-efficient design, \(\textsf{IOBufs}\) can handle large graphs that exceed the main memory capacity on both CPUs and GPUs. A significant result is that \(\textsf{IOBufs}\) can manage butterfly counting on the Clueweb graph, which has 37 billion edges and quintillions (\(10^{18}\)) of butterflies.
Loading