How Does Software Prefetching Work on GPU Query Processing?

Yangshen Deng; Shiwen Chen; Zhaoyang Hong; Bo Tang

How Does Software Prefetching Work on GPU Query Processing?

Yangshen Deng, Shiwen Chen, Zhaoyang Hong, Bo Tang

Published: 01 Jan 2024, Last Modified: 13 Nov 2024DaMoN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Improving the performance of GPU query processing is a well-studied problem in database community. However, its performance is still unsatisfactory due to the low utilization of GPU memory bandwidth. In the literature, employing software prefetching techniques to improve the bandwidth utilization is a common practice in CPU database as it overlaps computation cost and memory access latency. However, it was ignored by GPU database even though the software prefetching ability has been provided by modern GPU architecture (i.e., from NVIDIA Ampere).In order to investigate the effectiveness of software prefetching techniques on GPU query processing, we implement four software prefetching algorithms on GPU, i.e., Group Prefetch (GP), Software-Pipelined Prefetch (SPP), Asynchronous Memory Access Chaining (AMAC) and Interleaved Multi-Vectorizing (IMV) in the work. We then adapt them on hash join probe and BTree search tasks with a suite of optimizations. Last, we conduct comprehensive experiments and evaluate the performance of them. The results confirm the superiority of software prefetching techniques on GPU query processing. Specifically, they can achieve up to 1.19X speedup on hash join probe and 1.31X speedup on BTree search when compared with the implementations without software prefetching.

Loading