Accelerating Tensor-Train Decomposition on Graph Neural Networks

Shenghao Qiu, Chunwei Xia, Zheng Wang

Published: 2025, Last Modified: 17 Feb 2026IPDPS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Memory footprint is a major concern when training graph neural networks (GNNs) on large graph data. Tensor-train decomposition (TTD) offers a potential solution by representing high-dimensional tensors with a set of smaller tensors, reducing memory overhead. However, existing TTD-based solutions for GNNs fail to reuse intermediate computation results and minimize memory data transfers to improve GNN performance. We introduce FALCON, a software framework to accelerate TTDbased GNN training. FALCON leverages the observation that a small subset of graph nodes with high edge degrees are frequently accessed, enabling the caching of intermediate results to reduce redundant computation and data transfers. Additionally, it incorporates multi-level graph partitioning and kernel optimization techniques to boost computational efficiency. We evaluated FALCON using three real-world datasets on three GPU platforms-NVIDIA 3090, 4090, and A100. Experimental results show that FALCON outperforms previous TTD-based frameworks, delivering a 1.3 to $8.17 \times$ improvement in throughput while maintaining comparable or better efficiencies in memory footprint and model accuracy.

External IDs:dblp:conf/ipps/QiuX025