Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers

Oscar Antepara; Samuel Williams; Max Carlson; Jerry Watkins

Performance Portable Optimizations of an Ice-sheet Modeling Code on GPU-supercomputers

Oscar Antepara, Samuel Williams, Max Carlson, Jerry Watkins

Published: 01 Jan 2024, Last Modified: 14 May 2025SC Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we present GPU-optimizations for an ice-sheet modeling code known as MPAS-Albany Land Ice (MALI). MALI is a C++ template code that leverages the Kokkos programming model for portability and the Trilinos library for data structures, nonlinear and linear solvers and optimization packages for ice-sheet simulations. Performance of the most expensive kernel is assessed via the Roofline model to highlight the potential for code improvement according to the underlying GPU architecture. We perform a collection of optimizations consisting of loop fusions, loop optimizations and local accumulation to productively and portably attain an overall speedup of 3× in either NVIDIA and AMD GPU. We analyze the performance gains using a time-oriented performance portability model based on time per invocation and GPU data movement. Results show an increment between 20% and 50% on the performance portability metric by improving data locality on the GPU kernels of a Stokes solver and highlights the importance of optimizing GPU-ported scientific applications to maximize memory bandwidth and minimize data movement on modern supercomputers.

Loading