Abstract: 3D single object tracking (3D SOT) remains a challenging task due to the sparsity of point clouds, appearance variations caused by occlusions, and the difficulty of modeling long-term temporal context. Although recent Transformer-based approaches leverage memory mechanisms to propagate temporal information, their quadratic complexity and reliance on discrete historical snapshots limit both efficiency and temporal coherence. To address these limitations, we propose SSMTrack, a novel 3D SOT framework built upon state space models (SSMs), which efficiently models long-term temporal dependencies through a continuously evolving hidden state with linear complexity. Specifically, we introduce a serialization and bidirectional scanning (SBS) strategy to enhance intra-frame feature interactions and design a Target-Aware Encoder (TAE) to extract target cues while maintaining stable temporal representations. Furthermore, we propose a Temporal Causal Shape Learning (TCSL) mechanism that preserves critical historical information while adaptively integrating current inputs, progressively enriching target feature representations over time. Extensive experiments on three benchmark datasets demonstrate that SSMTrack achieves state-of-the-art performance with strong temporal coherence and high efficiency. The code will be released upon publication.
External IDs:doi:10.1109/tcsvt.2025.3627667
Loading