STAR: Spatial-Temporal Tracklet Matching for Multi-Object Tracking

Xuewei Bai; Yongcai Wang; Deying Li; Haodi Ping; LI Chunxu

STAR: Spatial-Temporal Tracklet Matching for Multi-Object Tracking

Xuewei Bai, Yongcai Wang, Deying Li, Haodi Ping, LI Chunxu

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-object-tracking, spatial-temporal, graph matching

TL;DR: To address viewpoint variations and occlusions, this paper proposes a novel Spatial-Temporal Tracklet Graph Matching paradigm (STAR).

Abstract: Existing tracking-by-detection Multi-Object Tracking methods mainly rely on associating objects with tracklets using motion and appearance features. However, variations in viewpoint and occlusions can result in discrepancies between the features of current objects and those of historical tracklets. To tackle these challenges, this paper proposes a novel Spatial-Temporal Tracklet Graph Matching paradigm (STAR). The core idea of STAR is to achieve long-term, reliable object association through the association of ``tracklet clips (TCs)". TCs are segments of confidently associated multi-object trajectories, which are linked through graph matching. Specifically, STAR initializes TCs using a Confident Initial Tracklet Generator (CITG) and constructs a TC graph via Tracklet Clip Graph Construction (TCGC). In TCGC, each object in a TC is treated as a vertex, with the appearance and local topology features encoded on the vertex. The vertices and edges of the TC graph are then updated through message propagation to capture higher-order features. Finally, a Tracklet Clip Graph Matching (TCGM) method is proposed to efficiently and accurately associate the TCs through graph matching. STAR is model-agnostic, allowing for seamless integration with existing methods to enhance their performance. Extensive experiments on diverse datasets, including MOTChallenge, DanceTrack, and VisDrone2021-MOT, demonstrate the robustness and versatility of STAR, significantly improving tracking performance under challenging conditions.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 16690

Loading