Weak Models Can be Good Teachers: A Case Study on Link Prediction with MLPs

Zongyue Qin; Shichang Zhang; Mingxuan Ju; Tong Zhao; Neil Shah; Yizhou Sun

Weak Models Can be Good Teachers: A Case Study on Link Prediction with MLPs

Zongyue Qin, Shichang Zhang, Mingxuan Ju, Tong Zhao, Neil Shah, Yizhou Sun

Published: 08 Nov 2025, Last Modified: 08 Nov 2025LOG 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Link Prediction, GNN-to-MLP, Heuristics, Ensemble

TL;DR: We demonstrate that simple heuristics can serve as more efficient and effective teachers for MLPs compared to GNNs. Additionally, we introduce an ensemble approach that further enhances effectiveness without adding inference latency.

Abstract: Link prediction is a crucial graph-learning task. Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance and reducing computational cost by removing graph dependency in the inference stage, especially in applications such as citation prediction and product recommendation where node features are abundant. However, existing distillation methods only use standard GNNs. Do stronger models such as those specially designed for link prediction (e.g., GNN4LP) lead to better students? Are heuristic-based methods (e.g., common neighbors) are bad teachers as they are weak models? This paper first explores the impact of different teachers in MLP distillation. Surprisingly, we find that stronger models do not always produce stronger students: MLPs distilled from GNN4LP can underperform those distilled from simpler GNNs, while weaker heuristic methods can teach MLPs to near-GNN performance with drastically reduced training costs. We provide both theoretical and empirical analysis to explain this phenomenon, revealing that a teacher is only as good as its teachable knowledge, the portion of its knowledge that can be transferred through the features accessible to the student. Building on these insights, we propose Ensemble Heuristic-Distilled MLPs (EHDM), which eliminates costly GNN training while effectively integrating complementary heuristic signals via a gating mechanism. Our extensive experiments show EHDM reduces the total training time by 1.95-3.32x while achieving an average 7.93% improvement over previous GNN-to-MLP approaches, indicating it is an efficient and effective link prediction method.

Supplementary Materials: zip

Poster: jpg

Poster Preview: jpg

Submission Type: Full paper proceedings track submission (max 9 main pages).

Submission Number: 159

Loading