A Framework for Graph Machine Learning on Heterogeneous Architecture

Yi-Chien Lin, Viktor K. Prasanna

Published: 2023, Last Modified: 30 Sept 2024FCCM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Graph Machine Learning (Graph ML) has shown great success in many domains such as Electronic Design Automation (EDA) [1], traffic prediction [2], recommendation systems [3], etc. where data is represented as graphs. In real-world scenarios, these domains often involve large-scale graphs with over billions of edges [4]; training Graph ML models on these large datasets using only a single CPU or GPU would take hours or even days [5], which calls for the need for acceleration. To provide more computing power and to efficiently process different types of workloads, state-of-the-art machines feature heterogeneous architecture [6], [7] that consists of multiple CPUs and a variety of accelerators such as GPUs, FPGAs, or AI-specific accelerators [8]–[10]. Heterogeneous architectures have great potential to accelerate Graph ML; however, due to the complexity of such platforms, accelerating Graph ML remains challenging. First, it requires extensive programming to accelerate Graph ML on a heterogeneous architecture [11]. In particular, launching each accelerator requires a different program (e.g., CUDA for GPU, Verilog for FPGA); one also needs to develop a complex host program to orchestrate the task coordination among the CPUs and the accelerators. Second, in order to achieve high performance, the Graph ML kernels on each device (i.e., the CPUs or the accelerators) need to be highly optimized. In addition, training a Graph ML on multiple devices in parallel suffers from workload imbalance and high communication overhead [12]; these issues limit the speedup that can be achieved on the heterogeneous architecture. Finally, it is challenging to build a portable design that can run on various heterogeneous architectures while achieving high performance. Portability is critical for one to build an impactful design that contributes to the research community and the industry as it allows others to easily build upon his or her work. To this end, we propose a novel framework for accelerating Graph ML training on a heterogeneous architecture. The framework aims to achieve high programmability, high performance, and high portability; we introduce how these three objectives are achieved in Section III.