Implementing Asynchronous Jacobi Iteration on GPUs

Yu-Hsiang Mike Tsai, Pratik Nayak, Edmond Chow, Hartwig Anzt

Published: 01 Jan 2022, Last Modified: 12 May 2023ScalAH@SC 2022Readers: Everyone

Abstract: Computation on architectures that feature fine-grained parallelism requires algorithms that overcome load imbalance, inefficient memory access, serialization, and excessive synchronization. In this paper, we explore an algorithm for iteratively solving systems of linear equations that allows for asynchronous updates by different execution units and completely removes the need for synchronization. Methods of this type have been identified as potentially competitive for computations on Exascale machines, but practical implementations for GPU platforms have scarcely been studied. We present an asynchronous Jacobi iteration optimized for high-end GPUs, demonstrate the superiority of the algorithm over a highly tuned synchronous Jacobi iteration, and deploy the algorithm as production-ready implementation in the Ginkgo open source library. The ideas presented here on the algorithm design, implementation and performance can help guide the design of other asynchronous iterative methods on GPUs.

0 Replies