Keywords: Unified Single Object Tracking, Memory, Prompt, Parameter-Efficient Fine-Tuning
Abstract: In this paper, we propose a simple but powerful parameter-efficient fine-tuning (PEFT) framework designed for unified single object trackers. Our framework is built upon two novel components: a Memory-Aware Compression Prompt (MCP) module and Dynamic State Fusion (DSF) modules. MCP effectively compresses memory features into memory-aware prompt tokens, which are deeply interacted with the input sequence throughout the entire backbone, significantly enhancing model performance while maintaining a stable computational load. DSF complements the discrete memory features by capturing the continuous dynamic state of the target, progressively introducing the updated dynamic state features from shallow to deep layers of the tracker, while also preserving high operational efficiency. MCP effectively overcomes the limitations of previous trackers that rely on only a few frames when introducing memory, which significantly increases input length and computational cost. It also addresses the insufficient fusion problem in existing memory-prompting methods. DSF remedies the lack of dynamic feature about continuous target variation in prior PEFT methods. Based on the MCP and DSF modules, we propose Uni-MDTrack, a tracker that supports tracking across five modalities. Experimental results across 10 datasets spanning five modalities demonstrate that Uni-MDTrack achieves state-of-the-art performance, with only 30\% of parameters requiring training. Furthermore, both MCP and DSF exhibit excellent generality, functioning as plug-and-play components that can boost the performance of various trackers. Code will be released for further research.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7007
Loading