Abstract: The duality between graphs and matrices means that many common graph analyses can be expressed with primitives such as generalized sparse matrix-vector multiplication (SpMSpV) and sparse matrix-matrix multiplication (SpGEMM). Achieving high performance on these primitives is challenging due to limited arithmetic intensity, irregular memory accesses, and significant network communication requirements in the distributed setting. In this paper we implement four graph applications using GraphPad, our optimized multinode implementations of generalized linear algebra primitives such as SpMSpV and SpGEMM. GraphPad is highly flexible to accommodate multiple data layouts, partitioning strategies, and incorporates communication optimizations. Our performance at scale can exceed that of CombBLAS by up to 40×. In addition to GraphPad's performance in a distributed setting, it is also within 2× the performance of GraphMat, a high performance graph framework on a single node for four out of five benchmarks. We also show our communication optimizations and flexibility are critical for good performance on both HPC clusters and commodity cloud platforms.
External IDs:dblp:conf/ipps/AndersonSSPWD16
Loading