G-store: high-performance graph store for trillion-edge processing

Published: 2016, Last Modified: 07 May 2025SC 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: High-performance graph processing brings great benefits to a wide range of scientific applications, e.g., biology networks, recommendation systems, and social networks, where such graphs have grown to terabytes of data with billions of vertices and trillions of edges. Subsequently, storage performance plays a critical role in designing a high-performance computer system for graph analytics. In this paper, we present G-Store, a new graph store that incorporates three techniques to accelerate the I/O and computation of graph algorithms. First, G-Store develops a space-efficient tile format for graph data, which takes advantage of the symmetry present in graphs as well as a new smallest number of bits representation. Second, G-Store utilizes tile-based physical grouping on disks so that multi-core CPUs can achieve high cache and memory performance and fully utilize the throughput from an array of solid-state disks. Third, G-Store employs a novel slide-cache-rewind strategy to pipeline graph I/O and computing. With a modest amount of memory, G-Store utilizes a proactive caching strategy in the system so that all fetched graph data are fully utilized before evicted from memory. We evaluate G-Store on a number of graphs against two state-of-the-art graph engines and show that G-Store achieves 2 to 8× saving in storage and outperforms both by 2 to 32×. G-Store is able to run different algorithms on trillion-edge graphs within tens of minutes, setting a new milestone in semi-external graph processing system.
Loading