GraphCSR: A Space and Time-Efficient Sparse Matrix Representation for Web-scale Graph Processing

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Track: Systems and infrastructure for Web, mobile, and WoT
Keywords: Graph representation; CSR, Sorted graph, Graph processing
TL;DR: GraphCSR: A Space and Time-Efficient Sparse Matrix Representation for Web-scale Graph Processing
Abstract: Graph data processing is essential for web-scale applications, including social networks, recommendation systems, and web of things (WoT) systems, where large, sparsely connected graphs dominate. Traditional sparse matrix storage formats like compressed sparse row (CSR) face significant memory and performance bottlenecks in distributed, federated, and edge-based computing environments, which are increasingly central to the web. To address this challenge, we propose GraphCSR, a novel storage format that clusters ver- tices with identical edge degrees and stores only the starting index of each group. This approach minimizes memory overhead and facilitates batch memory access while enhancing overall performance, making it particularly suitable for federated systems and resource-constrained edge nodes. Our experiments across various graph operations and large datasets show that GraphCSR achieves considerable memory savings and performance gains of large-scale, distributed graph processing. When deployed GraphCSR on a production-grade supercomputer with 79,024 computing nodes, it outperforms the top-ranked system on the Graph 500 list, demon- strating its potential for scaling web and WoT graph processing in large-scale distributed computing systems.
Submission Number: 310
Loading