TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation

Ziang Zhou; Zhihao Ding; Jieming Shi; Li Qing; Shiqi Shen

TINED: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation

Ziang Zhou, Zhihao Ding, Jieming Shi, Li Qing, Shiqi Shen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We develop a novel method, comprising teacher injection with fine-tuning and Dirichlet energy distillation, to distill layer-level GNN knowledge into MLPs, to accelerate model inference.

Abstract: Graph Neural Networks (GNNs) are pivotal in graph-based learning, particularly excelling in node classification. However, their scalability is hindered by the need for multi-hop data during inference, limiting their application in latency-sensitive scenarios. Recent efforts to distill GNNs into multi-layer perceptrons (MLPs) for faster inference often underutilize the layer-level insights of GNNs. In this paper, we present TINED, a novel approach that distills GNNs to MLPs on a layer-by-layer basis using Teacher Injection and Dirichlet Energy Distillation techniques. We focus on two key operations in GNN layers: feature transformation (FT) and graph propagation (GP). We recognize that FT is computationally equivalent to a fully-connected (FC) layer in MLPs. Thus, we propose directly transferring teacher parameters from an FT in a GNN to an FC layer in the student MLP, enhanced by fine-tuning. In TINED, the FC layers in an MLP replicate the sequence of FTs and GPs in the GNN. We also establish a theoretical bound for GP approximation. Furthermore, we note that FT and GP operations in GNN layers often exhibit opposing smoothing effects: GP is aggressive, while FT is conservative. Using Dirichlet energy, we develop a DE ratio to measure these effects and propose Dirichlet Energy Distillation to convey these characteristics from GNN layers to MLP layers. Extensive experiments show that TINED outperforms GNNs and leading distillation methods across various settings and seven datasets. Source code are available at https://github.com/scottjiao/TINED_ICML25/.

Lay Summary: GNNs are powerful for analyzing graph data, like social networks or molecular graphs. However, their core process, "message passing," can be slow during deployment, limiting their use in applications requiring quick responses. A promising solution is to simplify GNNs into Multi-Layer Perceptrons (MLPs) through a process called distillation. A GNN (the teacher) trains an MLP (the student) to perform well and with much faster inference. We introduce a novel distillation method TINED. We focus on two main operations in GNNs: feature transformation (FT) and graph propagation (GP). The MLP is designed to mimic the sequence of FT and GP operations in the GNN, and we also provide a theoretical guarantee for approximating GP. FT resembles fully-connected (FC) layers in MLPs, allowing direct parameter transfer from the GNN to the MLP, followed by fine-tuning. Moreover, we observe that FT and GP have opposite effects: GP smooths aggressively, while FT is more conservative. We measure these effects via Dirichlet energy and transfer them from GNN to MLP. Our approach allows the simpler MLP to match or even outperform the original GNN and other methods. TINED enables faster and more accurate predictions, making it ideal for real-world applications where speed is critical.

Link To Code: https://github.com/scottjiao/TINED_ICML25/

Primary Area: General Machine Learning->Representation Learning

Keywords: GNN2MLP distillation, knowledge distillation

Submission Number: 6448

Loading