Evaluation of SYCL's Different Data Parallel Kernels

Published: 01 Jan 2024, Last Modified: 05 Jun 2024IWOCL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: SYCL provides programmers with four, and in the case of AdaptiveCpp even five, ways for calling and writing a device kernel. This paper analyzes the performance of these diverse kernel invocation types for DPC++ and AdaptiveCpp as SYCL implementations on an NVIDIA A100 GPU, an AMD Instinct MI210 GPU, and a dual-socket AMD EPYC 9274F CPU. Using the example of a kernel matrix assembly, we show why the performance can differ by a factor of 100 in the worst case on the same hardware for the same problem using different SYCL implementations and kernel invocation types.
Loading