Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

Chendi Li; Yufan Xu; Sina Mahdipour Saravani; Ponnuswamy Sadayappan

Accelerated Auto-Tuning of GPU Kernels for Tensor Computations

Chendi Li, Yufan Xu, Sina Mahdipour Saravani, Ponnuswamy Sadayappan

Published: 01 Jan 2024, Last Modified: 13 Nov 2024ICS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: TVM is a state-of-the-art auto-tuning compiler for the synthesis of high-performance implementations of tensor computations. However, an extensive search in the vast design space via thousands of compile-execute trials is often needed to identify high-performance code versions, leading to high auto-tuning time. This paper develops new performance modeling and design space exploration strategies to accelerate the code optimization process within TVM. Experimental evaluation on a number of matrix-matrix multiplication and 2D convolution kernels demonstrates about an order-of-magnitude improvement in auto-tuning time to achieve the same level of code performance.

Loading