SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

Jie Chen; Mingyuan Bai; Shouzhen Chen; Junbin Gao; Junping Zhang; Jian Pu

SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

Jie Chen, Mingyuan Bai, Shouzhen Chen, Junbin Gao, Junping Zhang, Jian Pu

Published: 19 Sept 2024, Last Modified: 19 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The recursive node fetching and aggregation in message-passing cause inference latency when deploying Graph Neural Networks (GNNs) to large-scale graphs. One promising inference acceleration direction is to distill GNNs into message-passing-free student Multi-Layer Perceptrons (MLPs). However, the MLP student without graph dependency cannot fully learn the structure knowledge from GNNs, which causes inferior performance in heterophilic and online scenarios. To address this problem, we first design a simple yet effective Structure-Aware MLP (SA-MLP) as a student model. It utilizes linear layers as encoders and decoders to capture features and graph structures without message-passing among nodes. Furthermore, we introduce a novel structure-mixing knowledge distillation technique. It generates virtual samples imbued with a hybrid of structure knowledge from teacher GNNs, thereby enhancing the learning ability of MLPs for structure information. Extensive experiments on eight benchmark datasets under both transductive and online settings show that our SA-MLP can consistently achieve similar or even better results than teacher GNNs while maintaining as fast inference speed as MLPs. Our findings reveal that SA-MLP efficiently assimilates graph knowledge through distillation from GNNs in an end-to-end manner, eliminating the need for complex model architectures and preprocessing of features/structures. Our code is available at https://github.com/JC-202/SA-MLP.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Following the recommendations from Reviewer L4dK, we have incorporated and discussed additional references pertinent to our study. Additionally, we have included the memory usage associated with our methods to provide a clearer understanding of the scalability and practical implications of our approach.

Code: https://github.com/JC-202/SA-MLP

Supplementary Material: pdf

Assigned Action Editor: ~Danny_Tarlow1

Submission Number: 2334

Loading