Abstract: The majority of existing works predominantly focus on map-based trajectory prediction for the single agent. However, the High Definition (HD) map is not often available in many real-world traffic scenarios, and simultaneous multi-agent trajectory prediction is required for many autonomous and assisted driving applications. Therefore, this paper addresses the problem of map-free multi-agent trajectory prediction and proposes a Query-based Spatial and Temporal Interaction (QSTI) model. QSTI initially encodes the historical information of agents as queries. To make the model aware of both spatial and temporal information, spatial queries are learned by our proposed progressive cross-attention encoder, and temporal queries are learned by our proposed flatten self-attention encoder. Spatial queries and temporal queries are then fed into a graph convolution network and transformer network to learn the spatial-temporal interactions among agents, obtaining the interacted queries to perform multi-agent trajectory prediction. Experiments on the Argoverse 1, Argoverse 2, highD and rounD datasets show that QSTI achieves the highest performance compared to SOTA (State-Of-The-Art) map-free methods, and even outperforms some map-based methods on Argoverse 1 and Argoverse 2 benchmark datasets. In addition, our model also exhibits a faster inference speed than many models.
External IDs:dblp:journals/tvt/NanXLX25
Loading