Abstract: Multi-object tracking (MOT) in water surface scenes is crucial for the autonomous navigation of Unmanned Surface Vehicles (USVs). However, existing MOT datasets rarely focus on these scenes. Moreover, the few available water surface MOT datasets contain limited data shot onboard and concentrate narrowly on specific marine scenes, creating a significant gap from real-world USV navigation applications. To promote research on USV autonomous navigation, we introduce USVTrack, a fully onboard-shot MOT benchmark that covers diverse and complex water surface scenes, characterized by a high proportion of small objects and varied backgrounds. Then, we propose an innovative end-to-end method specifically designed for MOT in complex water surface scenes, termed as USVMOT. It improves tracking performance through four key contributions: 1) integrating mask information via knowledge distillation to boost feature discriminability; 2) deploying task-specific auxiliary pathways to alleviate the competition between detection and re-identification (ReID) in end-to-end MOT methods; 3) employing an adaptive high-quality mask generation strategy based on the Segment Anything Model (SAM) that obviates extensive manual annotation; and 4) introducing an object-aware association method that dynamically tailors the tracking strategy according to object size and motion speed. Extensive experiments on the USVTrack benchmark demonstrate that USVMOT outperforms existing methods. Our analysis reveals that MOT in complex water surface scenes remains challenging, highlighting the need for further advancements.
External IDs:doi:10.1109/tcsvt.2025.3595760
Loading