Abstract: Large-scale graph neural network (GNN) training systems on GPUs with CPU memory and storage face the challenge of efficiently caching the embedding data with accuracy guarantee. In this paper, we propose HCGNN, an out-of-GPU-memory GNN training system that combines GPU sampling and historical embedding caching. Our system supports dynamic embedding data caching through heuristic-based historical two-level cache design with lightweight data proactive eviction and high cache hit ratio. Compared with SOTA frameworks, HCGNN shows up to 6.7x speedup on graph sampling and 4.3x speedup on feature gathering within 0.5% accuracy loss.
External IDs:dblp:conf/appt/WangWHLWKLG25
Loading