Multimodal Vehicle Motion Prediction Based on Motion-Query Social Transformer Network for Internet of Vehicles

Hao Jiang, Baixuan Zhao, Chuan Hu, Hao Chen, Xi Zhang

Published: 15 Jul 2025, Last Modified: 05 Nov 2025IEEE Internet of Things JournalEveryoneRevisionsCC BY-SA 4.0

Abstract: Accurate prediction of vehicle motions is imperative for enabling cooperative perception and planning of autonomous vehicles, however effective modeling of complex spatio-temporal interactions and long-term dependencies between vehicles remains a formidable challenge. To tackle these issues, we propose a novel motion query-based social transformer network (MOST) for vehicle trajectory and intention prediction through a multitask approach, which is composed of temporal transformer encoder module, social interaction module and motion queries-based multitask feature decoder in a hierarchical manner. The temporal transformer is responsible for capturing long-range temporal correlations of individual motion states through a self-attention mechanism with residual connection, while the spatial interaction dependencies between vehicles are acquired through the social interaction module by constructing social tensors. Furthermore, Considering the uncertainty and diversity of future vehicle behaviors, a motion query-based feature decoder is proposed, which is equipped with learnable parameters to assimilate prior knowledge and generate multiple possible future trajectories and intentions. To assess our model’s effectiveness, we carried out comprehensive experiments on the open-source NGSIM and HighD dataset. The results demonstrated that our approach reaches unparalleled performance, with an average prediction accuracy improvement of about 50% on the NGSIM dataset and 20% on the HighD dataset compared to the state-of-the-art method.

External IDs:doi:10.1109/jiot.2025.3567291