MRM-RETrack: Hybrid Multi-scale Residual and Mamba for RGB-Event Tracking
Abstract: In recent years, RGB-event object tracking has achieved significant progress, demonstrating its increasingly enhanced perception and tracking capabilities in dynamic scenes. However, existing methods are predominantly based on CNN or Transformer architectures, which typically suffer from high computational complexity and memory overhead. The emerging Mamba architecture, while preserving the ability to model long-range dependencies, significantly reduces memory consumption, opening new avenues for the design of efficient tracking models. Nevertheless, current Mamba-based RGB-event tracking methods still face challenges such as insufficient feature learning and lack of cross-modal alignment, thereby impacting tracking accuracy and overall robustness. This paper proposes a novel RGB-event tracking framework, aiming to achieve high-performance, low-memory cross-modal object tracking. Specifically, we introduce a hierarchical local-global feature extraction strategy, integrating a Multi-Scale Residual Module (MSRM) and a Gated Mamba Module (GMM), to collaboratively enhance both fine-grained local feature extraction and long-range dependency capture. Furthermore, we develop an efficient Aligned Difference-Enhanced Mamba module (ADE-Mamba), which explicitly aligns complementary contextual features by focusing on inter-modal discrepancies. To further boost tracking performance, we design an adaptive dual-modal tracking head that dynamically adjusts and fuses the contributions from the RGB and event modalities, enabling precise target localization. Extensive experiments on multiple benchmark datasets demonstrate that our method exhibits superior performance in both short-term and long-term tracking tasks.
Loading