Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores

Published: 2024, Last Modified: 17 Jul 2025IEEE Trans. Parallel Distributed Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the communication overhead of MGPCG by first merging the kernels of multigrid (MG). We then develop an asynchronous neighboring communication algorithm to reduce the data communications across parallel processes. We demonstrated the benefits of our approach by applying it to the high-performance conjugate gradient (HPCG) benchmark and integrating it with a real-life algebraic multigrid package. We test the resulting software implementations on three ARMv8 and one Intel Xeon system. Experimental results show that our approach leads to a 1.62x-2.54x speedup over the engineer- and vendor-tuned HPCG implementations across various workloads and platforms.
Loading