Accelerator for LLM-Enhanced GNN with Product Quantization and Unified Indexing

Jiaming Xu, Jinhao Li, Jun Liu, Hao Zhou, Guohao Dai

Published: 20 Jan 2025, Last Modified: 06 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: To alleviate the vulnerability of graph neural networks (GNNs) on unseen graphs, many works propose to integrate large language models (LLMs) into GNNs, called graph foundation models (GFMs). The LLM-enhanced GNN, a typical integration method of GFMs, has achieved state-of-the-art performance in most graph-related tasks. However, intensive general matrix multiplications (GEMMs) overhead of LLMs poses a significant challenge to end-to-end inference latency. The introduction of LLMs brings 100× more workload than original GNNs, with GEMMs accounting for more than 99%, becoming the bottleneck of end-to-end inference.To tackle the above challenge, we present GFMEngine, an algorithm and hardware co-design accelerator supporting LLM-enhanced GNNs at multiple levels. (1) At the algorithm level, we point out that the computational precision of LLMs has little impact on the end-to-end accuracy, and propose a product-quantization-based (PQ-based) matrix multiplication for LLMs to alleviate the intensive GEMMs in LLMs, reducing more than 70% computation with negligible accuracy loss. (2) At the hardware level, we point out that the implementation of PQ-based matrix multiplication effectively alleviates the intensive GEMMs but results in a substantial increase in dynamic memory access. Coupled with the dynamic memory access inherent in GNNs, we design a unified indexing unit as the hardware support, reducing ~ 30% memory access in end-to-end inference. (3) At the compilation level, we further design an extensible instruction set architecture as the software support, GFM-ISA, for various real-world GFM tasks. We implement GFMEngine with TSMC 28nm process, and extensive experiments show that GFMEngine achieves up to 3.93×, 38.66×, 22.32×, 2.96× speedup and up to 102.52×, 37.82×, 28.37×, 2.56× energy efficiency improvement compared with NVIDIA Tesla A100 and the domain-specific accelerators, SGCN, MEGA, FACT, respectively.
Loading