Keywords: MOT, Visual tracking
TL;DR: We propose a generalized MOT framework with occlusion-aware ReID, road-structure-guided refinement, and a new vehicle-focused benchmark, achieving SOTA results on both multi-class and pedestrian tracking benchmarks.
Abstract: Conventional multi-object tracking (MOT) systems are predominantly designed
for pedestrian tracking and often exhibit limited generalization to other object
categories. This paper presents a generalized tracking framework capable of
handling multiple object types, with a particular emphasis on vehicle tracking in
complex traffic scenes. The proposed method incorporates two key components: (i)
an occlusion-aware re-identification mechanism that enhances identity preservation
for heavily occluded objects, and (ii) a road-structure-aware tracklet refinement
strategy that utilizes semantic scene priors—such as lane directions, crosswalks,
and road boundaries—to improve trajectory continuity and accuracy. In addition,
we introduce a new benchmark dataset comprising diverse vehicle classes with
frame-level tracking annotations, specifically curated to support evaluation of
vehicle-focused tracking methods. Extensive experimental results demonstrate that
the proposed approach achieves robust performance on both the newly introduced
dataset and several public benchmarks, highlighting its effectiveness in general-
purpose object tracking. While our framework is designed for generalized multi-
class tracking, it also achieves strong performance on conventional pedestrian
benchmarks, with HOTA scores of 66.4 on MOT17 and 65.7 on MOT20 test sets.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9466
Loading