Cross-modal deep interaction and modal-aware aggregation network for visible and infrared tracking

Published: 01 Jan 2025, Last Modified: 07 Nov 2025Appl. Soft Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•MCFTNet is a multi-branch RGB-TIR fusion network enabling robust RGBT tracking.•CMDIF performs cross-modal fusion to mine RGB-TIR complementarity for tracking.•MAAA adaptively aggregates multimodal features to improve tracking perception.•Hybrid attention improves feature extraction for accurate target localization.•Advanced results are achieved on GTOT, RGBT234, and LasHeR benchmarks.
Loading