Cross-Modal Stealth: A Coarse-to-Fine Attack Framework for RGB-T Tracker

Published: 10 Apr 2025, Last Modified: 24 Jul 2025Philadelphia, Pennsylvania, USAEveryoneCC BY 4.0
Abstract: Current research on adversarial attacks mainly focuses on RGB trackers, with no existing methods for attacking RGB-T cross-modal trackers. To fill this gap and overcome its challenges, we propose a coarse-to-fine cross-modal adversarial patch generation framework, achieving cross-modal stealth. On the one hand, we design a coarse-to-fine architecture grounded in the latent space to progressively and precisely uncover the vulnerabilities of RGB-T trackers. On the other hand, we introduce a correlation-breaking loss that disrupts the modal coupling within the tracker, spanning from the pixel to the semantic level. These two design elements ensure that the proposed method can overcome the obstacles posed by cross-modal information complementarity in implementing attacks. Furthermore, to enhance the reliable application of the proposed adversarial patches in the real world, we develop a point-tracking-based reprojection strategy that effectively mitigates performance degradation caused by multi-angle distortion during imaging. Extensive comparative and generalization experiments demonstrate the superiority of our attack method.
Loading