Adaptive Visual Prompting for Effective Satellite Video Tracking

Jiahao Wang, Fang Liu, Licheng Jiao, Hao Wang, Shuo Li, Yanbiao Ma, Lingling Li, Puhua Chen, Xu Liu, Mengjia Wang

Published: 01 Jan 2026, Last Modified: 13 Mar 2026IEEE Transactions on MultimediaEveryoneRevisionsCC BY-SA 4.0
Abstract: Satellite video tracking presents significant challenges due to unpredictable target variations, environmental disturbances, and occlusions. Existing approaches either rely on auxiliary modalities or require full fine-tuning of foundation models, resulting in excessive parameter sensitivity and poor generalization. Meanwhile, conventional prompt-based tuning only updates parameters at a single location, limiting its ability to adapt to complex appearance changes. To address these limitations, we propose Adaptive Visual Prompting for Effective Satellite Video Tracking (AVPTrack). Unlike conventional prompts, introduced Super Prompts dynamically refine the original template at multiple distinct positions. This multi-location adaptation allows for fine-grained representation learning, enabling the tracker to better capture target variations and resist environmental disturbances. Additionally, Dynamic Templates are introduced to mitigate tracking failures in highly challenging scenarios, such as occlusions and background clutter, ensuring robust target localization. Furthermore, the Template Selection Adapter (TSA) selects the most relevant templates in real-time, enhancing tracking efficiency. These components are optimized during training while keeping other parameters frozen, ensuring parameter efficiency. We also investigate the relationship between fine-tuning proportions and learning rates to optimize model performance. Extensive evaluations on the SV248S, SatSOT, and VISO datasets demonstrate the superior adaptability and robustness of AVPTrack compared to existing methods.
Loading