Knowledge distillation method for spatio-temporal tasks: a survey

Xiran Li, Shen Gao, Shuo Shang

Published: 2026, Last Modified: 06 Feb 2026GeoInformatica 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Spatio-temporal tasks, such as point-of-interest (POI) recommendation, increasingly rely on large models to capture complex spatial and temporal dependencies. However, the high computational cost and deployment challenges of these models hinder their practical applications. To tackle these challenges, knowledge distillation (KD) has emerged as a solution that enables the transfer of knowledge from large, high-performance teacher models to lightweight student models while preserving accuracy and efficiency. Despite its success in various domains, the application of KD in spatio-temporal tasks presents unique challenges due to the dynamic, multi-modal, and often incomplete nature of the data. In this paper, we provide a comprehensive review of knowledge distillation methods for spatio-temporal tasks. We first outline the key motivations for applying KD in this task, including model compression, robustness to incomplete data, and cross-domain generalization. Next, we categorize and analyze distillation methodologies based on their training objectives—output, feature, and relation knowledge—and discuss their respective advantages and limitations. Furthermore, we explore different distillation strategies, such as offline, online, and self-distillation, as well as diverse architectural frameworks, ranging from single-teacher setups to multi-teacher collaborative systems. Then we discuss the applications of KD for several downstream tasks. Finally, we highlight emerging trends and future research directions. By synthesizing recent advances and identifying open challenges, we aim to provide a valuable reference for building efficient and scalable spatio-temporal learning models.
Loading