VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking

Zekun Qian; Ruize Han; Junhui Hou; Linqi Song; Wei Feng

VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking

Zekun Qian, Ruize Han, Junhui Hou, Linqi Song, Wei Feng

18 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Object Tracking, Open-Vocabulary

Abstract: Open-vocabulary multi-object tracking (OVMOT) represents a critical new challenge involving the detection and tracking of diverse object categories in videos, encompassing both seen categories (base classes) and unseen categories (novel classes). This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT). Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens. In this paper, we propose OVTracker, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint. First, we consider the tracking-related state of the objects during tracking and propose a new prompt-guided attention mechanism for more accurate localization and classification (detection) of the time-varying objects. Subsequently, we leverage raw video data without annotations by formulating a self-supervised object similarity learning technique to facilitate temporal object association (tracking). Experimental results underscore that OVTracker outperforms existing methods, establishing itself as a state-of-the-art solution for open-vocabulary tracking tasks.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1461

Loading