Accelerating Large-Scale Out-of-GPU-Core GNN Training with Two-Level Historical Caching

Jing Wang, Taolei Wang, Juntao Huang, Yibo Liu, Xinkai Wang, Marius Kreutzer, Chao Li, Minyi Guo

Published: 2025, Last Modified: 21 Jan 2026APPT 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Large-scale graph neural network (GNN) training systems on GPUs with CPU memory and storage face the challenge of efficiently caching the embedding data with accuracy guarantee. In this paper, we propose HCGNN, an out-of-GPU-memory GNN training system that combines GPU sampling and historical embedding caching. Our system supports dynamic embedding data caching through heuristic-based historical two-level cache design with lightweight data proactive eviction and high cache hit ratio. Compared with SOTA frameworks, HCGNN shows up to 6.7x speedup on graph sampling and 4.3x speedup on feature gathering within 0.5% accuracy loss.

External IDs:dblp:conf/appt/WangWHLWKLG25