Deep Triply Attention Network for RGBT Tracking

Published: 01 Jan 2023, Last Modified: 05 Mar 2025Cogn. Comput. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: RGB-Thermal (RGBT) tracking has gained significant attention in the field of computer vision due to its wide range of applications in video surveillance, autonomous driving, and human-computer interaction. This paper focuses on achieving a robust fusion of different modalities for RGBT tracking through attention modeling. We propose an effective triply attentive network for robust RGBT tracking, which consists of a local attention module, a cross-modality co-attention module, and a global attention module. The local attention module enables the tracker to focus on target regions while considering background interference, generated through backpropagation of the score map with respect to the RGB and thermal image pair. To enhance the interaction of different modalities in feature learning, we introduce a co-attention module that selects more discriminative features for both the visible (RGB) and thermal modalities simultaneously. To compensate for the limitations of local sampling, we incorporate a global attention module based on multi-modal information to compute high-quality global proposals. This module not only complements the local search strategy but also re-tracks lost targets when they come back into view. Extensive experiments conducted on three RGBT tracking datasets demonstrate that our proposed method outperforms other RGBT trackers, achieving more competitive results. Specifically, on the LasHeR dataset, the precision rate, normalized precision rate, and success rate reach 57.5%, 51.6%, and 41.0%, respectively. The above state-of-the-art experimental results confirm the effectiveness of our method in exploring the complementary advantages between modalities and achieving robust visual tracking.
Loading