Abstract: In this paper, we present GPU-optimizations for an ice-sheet modeling code known as MPAS-Albany Land Ice (MALI). MALI is a C++ template code that leverages the Kokkos programming model for portability and the Trilinos library for data structures, nonlinear and linear solvers and optimization packages for ice-sheet simulations. Performance of the most expensive kernel is assessed via the Roofline model to highlight the potential for code improvement according to the underlying GPU architecture. We perform a collection of optimizations consisting of loop fusions, loop optimizations and local accumulation to productively and portably attain an overall speedup of 3× in either NVIDIA and AMD GPU. We analyze the performance gains using a time-oriented performance portability model based on time per invocation and GPU data movement. Results show an increment between 20% and 50% on the performance portability metric by improving data locality on the GPU kernels of a Stokes solver and highlights the importance of optimizing GPU-ported scientific applications to maximize memory bandwidth and minimize data movement on modern supercomputers.
Loading