Abstract: Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (e.g., depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design.
Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process.
More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling.
To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking.
Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling.
Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap.
Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https://github.com/XiaokunFeng/CSTrack.
Lay Summary: When tracking objects using different types of data like regular images (RGB), depth, thermal, and event data, it's crucial to efficiently merge and analyze these sources. Current tracking methods often process regular images and additional data streams separately, which leads to complicated models and increased computational demands. This complexity hinders the ability to effectively merge spatial data from different sources and track changes over time.
To tackle these challenges, we introduce CSTrack, a new tracking method that focuses on simplifying and enhancing the way data is combined and analyzed. CSTrack employs a unique Spatial Compact Module to merge image and additional data streams into a unified spatial feature. This integration allows for effective modeling across different types of data. Moreover, CSTrack uses a Temporal Compact Module to represent changes over time efficiently, refining how moving objects are tracked.
Our approach has been validated through extensive experiments, demonstrating that CSTrack sets new standards in tracking accuracy on RGB-X benchmarks, showcasing its simple yet powerful method for object tracking.
Link To Code: https://github.com/XiaokunFeng/CSTrack
Primary Area: Deep Learning->Algorithms
Keywords: RGB-X Tracking, Multimodal Tracking, Compact Spatiotemporal Features
Submission Number: 1757
Loading