STRIDE: Structure and Embedding Distillation with Attention for Graph Neural Networks

Anshul Ahluwalia; Payman Behnam; Rohit Das; Biswadeep Chakraborty; Abdelrahman Sharafeldin; Yuming Chang; Elaheh Sobhani; Alind Khare; Pan Li; Alexey Tumanov

STRIDE: Structure and Embedding Distillation with Attention for Graph Neural Networks

Anshul Ahluwalia, Payman Behnam, Rohit Das, Biswadeep Chakraborty, Abdelrahman Sharafeldin, Yuming Chang, Elaheh Sobhani, Alind Khare, Pan Li, Alexey Tumanov

17 Sept 2025 (modified: 06 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Graph Neural Networks, Knowledge Distillation, Compression

TL;DR: We propose STRIDE, a new knowledge distillation method that uses attention to align both structure and embeddings of important intermediate layers of GNNs, enabling a high compression ratio (up to 141X) with the same accuracy as SOTA approaches.

Abstract: Recent advancements in Graph Neural Networks (GNNs) have led to increased model sizes to enhance their capacity and accuracy. Such large models incur high memory usage, latency, and computational costs, thereby restricting their inference deployment. GNN compression techniques compress large GNNs into smaller ones with negligible accuracy loss. One of the most promising compression techniques is Knowledge Distillation (KD). However, most KD approaches for GNNs only consider the outputs of the last layers and do not consider the outputs of the intermediate layers of the GNNs. The intermediate layers may contain important inductive biases indicated by the graph structure and embeddings. Ignoring these layers may lead to a high accuracy drop, especially when the compression ratio is high. To address these shortcomings, we propose a novel KD approach for GNN compression that we call Structure and Embedding Distillation with Attention (STRIDE). STRIDE utilizes attention to identify important intermediate teacher-student layer pairs and focuses on using those pairs to align graph structure and node embeddings. We evaluate STRIDE on several datasets, such as OGBN-Mag and OGBN-Arxiv, using different model architectures, including GCNIIs, RGCNs, and GraphSAGE. On average, STRIDE achieves a 2.13% increase in accuracy with a 32.3X compression ratio on OGBN-Mag, a large graph dataset, compared to state-of-the-art approaches. On smaller datasets (e.g., Pubmed), STRIDE achieves up to a 141X compression ratio with higher accuracy compared to state-of-the-art approaches. These results highlight the effectiveness of focusing on intermediate-layer knowledge to obtain compact, accurate, and practical GNN models. During the discussion phase, we will privately share the anonymized repo with reviewers and area chairs, and we will release it publicly upon acceptance.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 9899

Loading