PGSampler: Accelerating GPU-Based Graph Sampling in GNN Systems via Workload Fusion

Xiaohui Wei; Weikai Tang; Hao Qi; Hengshan Yue

PGSampler: Accelerating GPU-Based Graph Sampling in GNN Systems via Workload Fusion

Xiaohui Wei, Weikai Tang, Hao Qi, Hengshan Yue

Published: 01 Jan 2024, Last Modified: 16 Apr 2025CLUSTER 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable performance across various domains. Sample-based training, a practical strategy for training on large-scale graphs, often faces time-consuming graph sampling challenges. To address this, GPU-based graph sampling has been introduced, while there is still room for further efficiency improvements. Though several prior works have been proposed to accelerate the computation or memory access for GPU-based graph sampling, we show that the performance bottlenecks induced by small workload cannot be ignored. In this paper, we propose PGSampler, an efficient system for accelerating GPU-based graph sampling. First, PGSampler leverages a barrier-free execution mode to fuse workload, significantly improving the resource utilization. By altering the sampling execution mode, PGSampler also reduces the preprocessing time before kernel execution, thus accelerating the whole sampling process. Next, based on the new sampling execution mode, considering the dynamically generated nature of sampling tasks, PGSampler adopts a persistent kernel design and uses the task queue to assign tasks, achieving dynamic load balancing. Evaluations with diverse parameter settings show that PGSampler can achieve up to 2.22 × performance speedup over the state-of-the-art GNN system DGL.

Loading