Abstract: Recent years have witnessed phenomenal growth in the application, and capabilities of graphical processing units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging and, generally, only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires retuning after code changes, for different input data, and for different architectures. However, the discrete and nonconvex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is varied. We conduct a survey by performing experiments on 26 different kernel spaces, from nine different GPUs, for 16 different evolutionary black-box optimization algorithms. We then analyze these results and introduce a novel metric based on the PageRank centrality concept as a tool for gaining insight into the difficulty of the optimization problem. We demonstrate that our metric correlates strongly with the observed tuning performance.
Loading