Visible-thermal multiple object tracking: Large-scale video dataset and progressive fusion approach

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The complementary benefits from visible and thermal infrared data are extensively utilized in various computer vision tasks, such as visual tracking and object detection, but rarely explored in Multiple Object Tracking (MOT). This paper contributes a large-scale Visible–Thermal video benchmark for MOT, named VT-MOT, which presents several key advantages. First, it comprises 582 video sequence pairs with 401,000 frame pairs collected from diverse sources, including surveillance, drone, and handheld platforms. Second, VT-MOT has dense and high-quality annotations, with 3.99 million annotation boxes verified by professionals. To provide a strong baseline, we design a simple yet effective tracking framework, which effectively fuses temporal information and complementary information of two modalities in a progressive manner, for robust visible–thermal MOT. Comprehensive experiments validate the proposed method’s superiority over existing state-of-the-art methods, while potential future research directions for visible–thermal MOT are outlined. The project is released in https://github.com/wqw123wqw/PFTrack.
Loading