FastTracker: Real-Time and Accurate Visual Tracking

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: MOT, Visual tracking
TL;DR: We propose a generalized MOT framework with occlusion-aware ReID, road-structure-guided refinement, and a new vehicle-focused benchmark, achieving SOTA results on both multi-class and pedestrian tracking benchmarks.
Abstract: Conventional multi-object tracking (MOT) systems are predominantly designed for pedestrian tracking and often exhibit limited generalization to other object categories. This paper presents a generalized tracking framework capable of handling multiple object types, with a particular emphasis on vehicle tracking in complex traffic scenes. The proposed method incorporates two key components: (i) an occlusion-aware re-identification mechanism that enhances identity preservation for heavily occluded objects, and (ii) a road-structure-aware tracklet refinement strategy that utilizes semantic scene priors—such as lane directions, crosswalks, and road boundaries—to improve trajectory continuity and accuracy. In addition, we introduce a new benchmark dataset comprising diverse vehicle classes with frame-level tracking annotations, specifically curated to support evaluation of vehicle-focused tracking methods. Extensive experimental results demonstrate that the proposed approach achieves robust performance on both the newly introduced dataset and several public benchmarks, highlighting its effectiveness in general- purpose object tracking. While our framework is designed for generalized multi- class tracking, it also achieves strong performance on conventional pedestrian benchmarks, with HOTA scores of 66.4 on MOT17 and 65.7 on MOT20 test sets.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9466
Loading