ApproxG: Fast Approximate Parallel Graphlet Counting Through Accuracy Control

Published: 2018, Last Modified: 06 Aug 2024CCGrid 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Graphlet counting is a methodology for detecting local structural properties of large graphs that has been in use for over a decade. Despite tremendous effort in optimizing its performance, even 3- and 4-node graphlet counting routines may run for hours or days on highly optimized systems. In this paper, we describe how a synergistic combination of approximate computing with parallel computing can result in multiplicative performance improvements in graphlet counting runtimes with minimal and controllable loss of accuracy. Specifically, we describe two novel techniques, multi-phased sampling for statistical accuracy guarantees and cost-aware sampling to further improve performance on multi-machine runs, which reduce the query time on large graphs from tens of hours to several minutes or seconds with only <;1% relative error.
Loading