Abstract: The graph has recently enabled substantial advances in big data analysis. As graphs are increasing from billions to trillions, efficient graph processing requires large-scale distributed clusters, which have up to thousands of nodes. For big data applications of which the computation is relatively simple, while the communication, especially for imbalanced communication is the bottleneck on distributed clusters, where huge numbers of small messages are transferred through 2D-topology networks. Graph partitioning is the dominant factor to affect the performance of large-scale distributed graph processing. Current graph partitioning policies have paid extensive attention to the utilization of the power law of big graphs but failed to exploit the advanced architectural benefits of 2D topology. To address such a problem, this paper presents GraphMedia, a communication-balanced graph partitioning for distributed search at scale. The key idea of GraphMedia is a communication-balanced partitioning to balance communication based on hardware/software co-design, in which the power law of graphs would be explored to average communication among nodes, and communication would be balanced between row and column by leveraging advanced 2D-topology knowledge. We use both benchmarks and real-world graphs to validate GraphMedia. Specially, GraphMedia-based Graph500 tests on the Tianhe supercomputer are superior to the fastest systems in the latest Graph500 lists (June 2022). We finally apply GraphMedia to real-world graphs for online graph media access, which outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude.
0 Replies
Loading