Adaptive Expert Decision for RGB-T Tracking

Published: 01 Jan 2025, Last Modified: 05 Nov 2025IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The features provided by RGB and Thermal Infrared (TIR) images have their own characteristics. Therefore, how to adaptively fuse multi-modal features according to different tracking scenarios is crucial for RGB-T tracking. However, current mainstream RGB-T tracking algorithms often use fixed fusion operations for modal interaction in different scenarios. Consequently, their tracking permanence is deteriorated due to they are unable to dynamically adjust the fused multi-modal features based on the current scenes. To address this issue, we propose a novel RGB-T tracking algorithm called AETrack, which can dynamically extract effective modal features in different scenarios for adaptive fusion. Firstly, we design an adaptive expert decision mechanism that employs multiple experts to process the input features. Each expert focuses on and learns different relevant features. Based on this mechanism, we then propose a feature-guided method that leverages the correlations between modalities to provide cross-modal information. This guidance enables the adaptive expert mechanism to adaptively select the most suitable expert to output effective features based on different scenarios, ensuring that our proposed AETrack prioritizes effective features and thus alleviates interference from irrelevant information. Finally, we design a Progressive Cross-modal Fusion operation to achieve multi-level adaptive fusion of effective features across different modalities. Benefiting from this adaptive fusion process, we can effectively achieve multi-modal interaction in different scenarios to guide robust tracking. Extensive experiments on three popular benchmarks (i.e., LasHeR, RGBT210, RGBT234) show that our proposed AETrack can significantly improve tracking performance.
Loading