Efficient and Accurate Cross-Camera Vehicle Trajectory Recovery

Taihang Dong, Dingyu Yang, Dongping Zhang, Sai Wu, Shaojie Qiao, Dongxiang Zhang

Published: 01 Feb 2026, Last Modified: 23 Jan 2026IEEE Transactions on Knowledge and Data EngineeringEveryoneRevisionsCC BY-SA 4.0
Abstract: Recovering trajectories of all moving vehicles from urban-scale cameras is an attractive but challenging topic for massive video data management. Existing solutions frame it as an iterative image clustering problem. The snapshots from the same vehicle are grouped within a cluster, which is further refined according to the spatial-temporal attributes. However, these approaches exhibit expensive iterative clustering overhead and ineffective exploitation of spatial-temporal clues. Moreover, they are designed for batch processing, facing performance degradation when handling newly collected surveillance data. In this paper, we propose a novel joint representation clustering framework, which recovers trajectories from vehicle snapshots in an efficient and accurate fashion and is inherently suited for processing video streaming data. Technically, spatial-temporal features are explicitly extracted to construct the joint representation, eliminating the need for iterative refinement, which significantly reduces computational overhead. Furthermore, we present a simple yet effective clustering scheme with one-pass scan on joint representations to generate large-scale clusters. To mitigate the dependency on external data, a joint training method based on self-supervised learning is introduced. We conduct extensive experiments in both batch and streaming modes. The results show that in the batch mode, TRACER achieves a speedup of at least $2.3\times$ and yields recovery $F_{1}$-score improvements of $1.7\%-19.6\%$. In the streaming experimental setup, it achieves $1.1\%-27.6\%$ improvement in $F_{1}$-score, and reduces the average snapshot processing time by up to 84.8%.
Loading