Abstract: Matrix multiplication is crucial in scientific computing, but it demands substantial resources. We propose a framework for effectively utilizing heterogeneous GPUs to large matrix multiplication. By splitting matrices into small blocks and using Douglas’s variant of Strassen’s algorithm, we enable concurrent tasks on heterogeneous systems. Our framework improves speed by 89.5% on homogeneous GPU servers and by 108% in multi-server heterogeneous GPU setups.
External IDs:dblp:conf/icpads/SunLCL23
Loading