Spatio-Temporal Graph Knowledge Distillation

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Spatial-Temporal Data Mining, Graph Neural Networks, Urban Computing, Knowledge Distillation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a model-agnostic Knowledge Distillation framework for Large-scale Spatial-Temporal Graph learning, which learn generalizable and robust MLPs on STG from teacher STGNNs in a simple yet effective way.
Abstract: Large-scale spatio-temporal prediction is a critical area of research in data-driven urban computing, with far-reaching implications for transportation, public safety, and environmental monitoring. However, the challenges of scalability and generalization continue to pose significant obstacles. While many advanced models rely on Graph Neural Networks (GNNs) to encode spatial and temporal correlations, they often struggle with the increased time and space complexity of large-scale datasets. The recursive GNN-based message passing schemes used in these models can make their training and deployment difficult in real-life urban sensing scenarios. Additionally, large-scale spatio-temporal data spanning long time spans introduce distribution shifts, further highlighting the need for models with improved generalization performance. To address these challenges, we propose Spatio-Temporal Graph Knowledge Distillation (STGKD) paradigm to learn lightweight and robust Multi-Layer Perceptrons (MLPs) through effective knowledge distillation from cumbersome spatio-temporal GNNs. To ensure robust knowledge distillation, we integrate the spatio-temporal information bottleneck with the teacher-bounded regression loss. This allows us to filter out task-irrelevant noise and avoid erroneous guidance, resulting in robust knowledge transfer. Additionally, we enhance the generalization ability of student MLP by incorporating spatial and temporal prompts to inject downstream task contexts. We evaluate our framework on three large-scale spatio-temporal datasets for various urban computing tasks. Experimental results demonstrate that our model outperforms state-of-the-art approaches in terms of both efficiency and accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5536
Loading