vGraph: Memory-Efficient Multicore Graph Processing for Traversal-Centric Algorithms

Menghan Jia, Yiming Zhang, Xinbiao Gan, Dongsheng Li, Erci Xu, Ruibo Wang, Kai Lu

Published: 01 Jan 2022, Last Modified: 12 May 2023SC 2022Readers: Everyone

Abstract: To lower the monetary/energy cost, single-machine multicore graph processing is gaining increasing attention for a wide range of traversal-centric graph algorithms such as BFS, SSSP, CC, and PageRank, of which the processing is relatively simple and the topology data (vertices and edges) dominates the memory footprint. This paper presents <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$v$</tex> Graph, a NUMA-aware, memory-efficient multicore graph processing system for traversal-centric algorithms. <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$v$</tex> Graph proposes an ultralight NUMA-aware graph preprocessing scheme which eliminates almost all complex preprocessing steps and pipelines per-NUMA graph loading and compressing, to effectively reduce inter-NUMA memory accesses while keeping both preprocessing cost and peak memory footprint low. We further optimize <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$v$</tex> Graph with effective HPC techniques including prefetching and work-stealing. Evaluation on a 384GB-memory, four-NUMA machine shows that compared to the state-of-the-art NUMA-aware/-unaware systems, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$v$</tex> Graph can process much larger real-world and synthetic graphs with various traversal-centric algorithms, achieving significantly higher memory efficiency and lower processing time.

0 Replies