Abstract: This paper presents the first Compute-in-memory (CIM) Graph Neural Networks (GNN) processor, CIMGN, running complete GNN inference blocks with up to 16-bit precession. Accelerating GNN is more challenging than DNN for its combination of compute-bounded Transformation and memory-bounded Aggregation, which leads to low efficiency and low bandwidth utilization on traditional architecture. CIMGN uses a heterogeneous macro supporting Transformation in the CIM array while accommodating Aggregation in closed-coupled content addressable memory (CAM)-enabled search-reduce engine. An all-digital CIM-CAM macro is first proposed and optimized with hierarchical Result Line (RL) and block-wise/fine-grained CAM search skipping, resulting in 14.8% power saving for matrix-vector-multiplication (MVM) and 37% CAM search operation reduction on the CORA dataset. Then a multicore architecture is developed to support key GNN operations such as Transformation, Aggregation, Point-wise, and Activation. Bit-wise sparsity is explored to save both power and cycle. Two dedicated dataflows are finally proposed for adaptive mapping leveraging CAM search skipping and intra-edge data forwarding. The silicon prototype is fabricated in TSMC 65nm CMOS technology and benchmarked on classic graph datasets. Measurement results show 287-0.91 TOPS/W (1-16bit) macrolevel efficiency and 51.7 x10$^{\mathbf{3}}$Graph/J system-level efficiency.
Loading