I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication

Published: 2022, Last Modified: 21 Jan 2026IPDPS 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Data movements between different levels of the memory hierarchy (I/O-transitions, or simply I/O s) are a critical performance bottleneck in modern computing. Therefore it is a problem of high practical relevance to find algorithms that use a minimal number of I/O s. We present a cache-oblivious sparse matrix-sparse matrix multiplication algorithm that uses a worst-case number of I/O s that matches a previously established lower bound for this problem (0 (N2/B.M) read-I/Os and 0 (N2/B) write-I/Os, where $N$ is the size of the problem instance, $M$ is the size of the fast memory and $B$ is the size of the cache lines). When the output does not need to be stored, also the number of write-I/Os can be reduced to 0 (N2/B.M). This improves the worst-case I/O-complexity of the previously best known algorithm for this problem (which is cache-aware) by a logarithmic multiplicative factor. Compared to other cache-oblivious algorithms our algorithm improves the worst-case number of I/Os by a multiplicative factor of Θ(M. N). We show how the algorithm can be applied to produce the first I/O-efficient solution for the sparse 2- vs 3-diameter problem on sparse directed graphs.
Loading