Abstract: Manycores like the Intel Xeon Phi and graphics processing units like the NVIDIA Tesla series are prime examples of systems for accelerating applications that run on current CPU multicores. It is therefore of interest to build fast, reliable linear system solvers targeting these architectures. Moreover, it is of interest to conduct cross comparisons between algorithmic implementations in order to organize the types of optimizations and transformations that are necessary when porting in order to succeed in obtaining performance portability. In this work we aim to present a detailed study of the adaptation and implementation of g-Spike for the Xeon Phi. g-Spike was originally developed to solve general tridiagonal systems on GPUs, on which it returns high performance while also solving systems for which other state-of-the-art general tridiagonal GPU solvers do not succeed. The solver is based on the Spike framework, using QR factorization without pivoting implemented via Givens rotations. We show the necessary adaptations on the Xeon Phi because of the significant differences in the programming models and the underlying architectures as well as the relative performance differences for data access and processing operations.
0 Replies
Loading