Peridot: Accelerating Out-of-Core GCN Data Reuse Pattern and Co-Design on GPU

Published: 30 May 2024, Last Modified: 08 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: Machine Learning for System
Presentation: Virtual
Keywords: Graph Convolutional Networks (GCNs), sparse matrix multiplication (SpGEMM), Peridot, Out-of-Core
Presenter Full Name: Shakya Jayakody
TL;DR: Peridot, an innovative framework that leverages NVIDIA GPUDirect Storage (GDS) technology coupled with sophisticated memory allocation strategies to enhance the performance of out-of-core GCNs on GPUs.
Presenter Email: shakya@ucf.edu
Abstract: Graph Convolutional Networks (GCNs) are pivotal in a diverse array of applications, including scientific research, engineering, biomedical protein-protein interactions (PPI), and natural language processing (NLP). The demand for efficient GCN computation has catalyzed extensive research into GPU acceleration techniques. However, a persistent challenge in this domain is managing out-of-core data, which exceeds the storage capacity of limited GPU memory, resulting in significant data movement latency and underutilization of GPU computational resources. This paper introduces Peridot, an innovative framework that leverages NVIDIA GPUDirect Storage (GDS) technology coupled with sophisticated memory allocation strategies to enhance the performance of out-of-core GCNs on GPUs. Peridot is engineered to significantly reduce data movement latency by orchestrating efficient transfers of sparse matrix data between the GPU and system memory, particularly during sparse chain matrix multiplication. It optimizes memory usage within the GPU to support larger matrices than previously possible. The system incorporates a dynamic memory allocation scheme tailored to the sparsity patterns of matrices, reducing unnecessary memory consumption and improving data locality. Additionally, by utilizing GDS technology, Peridot enables direct, high-speed data transfers between storage devices and GPU memory, bypassing the CPU and reducing the overhead associated with traditional data transfer methods. Our evaluations demonstrate that Peridot substantially surpasses baseline models, offering considerably low latency in both synthetic and real-world graph benchmarks. These improvements are particularly notable in scenarios involving extensive GCN data, where inefficiencies in data movement and memory allocation have historically been significant obstacles.
Presenter Bio: Shakya Jayakody received the BS and MS degrees in electrical engineering from Louisiana Tech University, in 2016 and 2020, respectively. He is working toward a PhD at the Electrical and Computer Engineering Department, University of Central Florida, Orlando. His research interests include Memory systems, sparse matrix algorithms, machine learning, and graph algorithms.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: https://youtu.be/8mTthAVSxWU?si=QexKuFZ_QpKhbak7
Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.
Slides: pdf
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 6
Loading