Keywords: Continual Learning, Single Object Tracking, Visual Chain-of-Thought
Abstract: Stable tracking in both daytime and nighttime is essential for applying single object tracking to real-world scenarios. Traditional daytime trackers mainly rely on clear appearance features, which leads to significant performance degradation under nighttime conditions. Conversely, nighttime trackers often incorporate low-light enhancement techniques to improve robustness but struggle to maintain comparable accuracy in daytime environments. To address this challenge, we propose a novel framework, termed Visual Chain-of-Thought (VCoT), which reformulates object tracking as a structured reasoning process. VCoT follows a three-stage cognitive path of Observe–Recall-Infer–Memorize: it first observes and extracts the appearance and motion features of the current frame; then retrieves and fuses relevant historical prompts from a memory pool via an attention mechanism to enable context-aware reasoning; and finally employs gradient-based importance evaluation to update the memory by selectively retaining the most valuable knowledge. This design allows the model to integrate real-time observations with historical experiences, while achieving continual learning and effective knowledge transfer across tasks. Extensive experiments on multiple challenging benchmarks demonstrate that VCoT consistently outperforms existing methods under diverse illumination conditions. Codes will be available at https://github.com/Gkk10/VCoT.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24507
Loading