GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP

Sarthak Malik; Aditi Rai; Ram Ganesh V; Himank Sehgal; Akshay Sethi; Aakarsh Malhotra

GraTeD-MLP: Efficient Node Classification via Graph Transformer Distillation to MLP

Sarthak Malik, Aditi Rai, Ram Ganesh V, Himank Sehgal, Akshay Sethi, Aakarsh Malhotra

Published: 16 Nov 2024, Last Modified: 26 Nov 2024LoG 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Graph Transformers, Knowledge Distillation, Node Classification

TL;DR: We propose a novel framework to effectively distill the attention present in a graph transformer onto a scalable MLP model

Abstract: Graph Transformers (GTs) like NAGphormer have shown impressive performance by encoding graph’s structural information and node features. However, their self-attention and complex architectures require high computation and memory, hindering their deployment. Thus, we propose a novel framework called Graph Transformer Distillation to Multi-Layer Perceptron (GraTeD-MLP). GraTeD-MLP leverages knowledge distillation (KD) and a novel decomposition of attentional representation to distill the learned representations from the teacher GT to a student MLP. During distillation, we incorporate a gated MLP architecture where two branches learn the decomposed attentional representation for a node while the third predicts node embeddings. Encoding the attentional representation mitigates the MLP’s over-reliance on node features, enabling robust performance even in inductive settings. Empirical results demonstrate that the proposed GraTeD-MLP has significantly faster inference time than the teacher GT model, with speed-up ranging from 20×−40×. With up to 25% improved performance over vanilla MLP. Furthermore, we empirically show that the proposed GraTeD-MLP outperforms other GNN distillation methods in seven datasets in both inductive and transductive settings

Submission Type: Full paper proceedings track submission (max 9 main pages).

Poster: png

Submission Number: 96

Loading