Streaming Spatial-Temporal Prompt Learning for RGB-T Tracking

ICLR 2025 Conference Submission9719 Authors

27 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Prompt Learning, Multimodal, RGB-T Tracking
Abstract: In the process of multimodal interaction, effective spatial-temporal information of correlated targets is crucial for RGB-T tracking. However, most existing methods only utilize spatial information for template-search matching or merely introduce an additional dynamic template with sparse temporal perception. These approaches overlook rich temporal cues across consecutive video frames, such as target appearance changes and motion trajectory. To establish effective spatial-temporal associations during multimodal interaction, we propose a video-level RGB-T tracking paradigm via prompt learning, termed PromptTrack. It densely models the spatial-temporal relationships of targets in multimodal contexts by incorporating streaming spatial-temporal prompts within a continuous sequence of video frames. Specifically, PromptTrack learns target changes and motion trajectory from historical frames through streaming temporal prompt for each modality, and then learns multimodal spatial prompt conditioned on temporal prompt to effectively leverage multimodal complementary information. Benefiting from the proposed spatial-temporal prompt learning method, PromptTrack exhibits superior target location capability and robustness in complex tracking scenarios. The novel prompt-based tracking paradigm can also be effortlessly extended to other tracking domains such as RGB-D and RGB-E. Extensive experiments on three prevailing benchmark datasets demonstrate our method achieves new state-of-the-art performances. In particular, PromptTrack achieves Precision score of 76.2% and Success score of 60.7% on LasHeR dataset while running at a real-time speed of 35 FPS. Codes and models will be released.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9719
Loading