Graph Neural Networks (GNNs) have achieved state-of-the-art performance on various graph-related tasks, but training GNNs on large-scale graphs remains challenging due to the high communication overhead of neighborhood aggregation. Existing distributed GNN training frameworks suffer from inefficiencies in communication due to frequent data movement between user and kernel space and the use of generic communication primitives.
We introduce PINCH, a novel system designed to speed up distributed GNN training. It employs eBPF and kernel hooks (XDP and TC) to shift communication-heavy operations to the kernel space. PINCH uses three main techniques: (1) in-kernel neighborhood aggregation via eBPF and XDP to cut communication costs, (2) in-kernel broadcasting through eBPF and TC to minimize user-kernel transitions and network stack passes, and (3) caching and reusing aggregated embeddings with eBPF maps to reduce redundant data processing. These integrations aim to alleviate the communication bottleneck and accelerate overall training.