PINCH: Accelerating Distributed GNN Training through In-Kernel Operation Using eBPF

Published: 30 May 2024, Last Modified: 16 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: System for Machine Learning
Presentation: Virtual
Keywords: Distributed GNN Training, In-Kernel Optimization, Aggregation and Broadcasting, eBPF Offloading
Presenter Full Name: Jianchang Su
Presenter Email: jianchang.su@uconn.edu
Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art performance on various graph-related tasks, but training GNNs on large-scale graphs remains challenging due to the high communication overhead of neighborhood aggregation. Existing distributed GNN training frameworks suffer from inefficiencies in communication due to frequent data movement between user and kernel space and the use of generic communication primitives. We introduce PINCH, a novel system designed to speed up distributed GNN training. It employs eBPF and kernel hooks (XDP and TC) to shift communication-heavy operations to the kernel space. PINCH uses three main techniques: (1) in-kernel neighborhood aggregation via eBPF and XDP to cut communication costs, (2) in-kernel broadcasting through eBPF and TC to minimize user-kernel transitions and network stack passes, and (3) caching and reusing aggregated embeddings with eBPF maps to reduce redundant data processing. These integrations aim to alleviate the communication bottleneck and accelerate overall training.
Presenter Bio: Jianchang Su is a Ph.D. student at UConn, in the Department of Computer Science & Engineering. His research interests include cloud computing, serverless computing, and machine learning systems.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
YouTube Link: https://www.youtube.com/watch?v=w0GjrzzlauY
Slides: pdf
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 10
Loading