State Space Models for Long-Term Temporal Context in 3D Single Object Tracking

Jie Xiao, Yinchao Ma, Yuyang Tang, Chuxin Wang, Tianzhu Zhang

Published: 01 Jan 2025, Last Modified: 24 Jan 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0
Abstract: 3D single object tracking (3D SOT) remains a challenging task due to the sparsity of point clouds, appearance variations caused by occlusions, and the difficulty of modeling long-term temporal context. Although recent Transformer-based approaches leverage memory mechanisms to propagate temporal information, their quadratic complexity and reliance on discrete historical snapshots limit both efficiency and temporal coherence. To address these limitations, we propose SSMTrack, a novel 3D SOT framework built upon state space models (SSMs), which efficiently models long-term temporal dependencies through a continuously evolving hidden state with linear complexity. Specifically, we introduce a serialization and bidirectional scanning (SBS) strategy to enhance intra-frame feature interactions and design a Target-Aware Encoder (TAE) to extract target cues while maintaining stable temporal representations. Furthermore, we propose a Temporal Causal Shape Learning (TCSL) mechanism that preserves critical historical information while adaptively integrating current inputs, progressively enriching target feature representations over time. Extensive experiments on three benchmark datasets demonstrate that SSMTrack achieves state-of-the-art performance with strong temporal coherence and high efficiency. The code will be released upon publication.
Loading