Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

Published: 01 Jan 2023, Last Modified: 06 Nov 2024PPoPP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an even share of the aggregate inner loop iterations among physical processing elements. This provides a near-perfect utilization of computing resources, regardless of how efficiently the output tiling for any given problem quantizes across the underlying processing elements.
Loading