DDCTrack: Dynamic Token Sampling for Efficient UAV Transformer Tracking

Published: 01 Jan 2024, Last Modified: 12 Jun 2025ICPR (15) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Although state-of-the-art transformer models have shown promising results in unmanned aerial vehicle (UAV) tracking, they come with high computational demands. Existing tracking methods aim to reduce computational complexity by controlling the number of tokens. However, this method is not effective for all tracking methods. Therefore, we propose a novel dynamic token sampling for an efficient UAV transformer tracking framework. Unlike previous transformer-based tracking methods, our method avoids the need for complex head networks like classification and regression. It solely employs our newly designed encoder, comprising three key components: Dynamic Position Embedding, Dynamic Token Sampler, and Convolutional Feed-Forward Network. This module enhances visual representation by scoring and dynamically sampling tokens, allowing for a flexible token count that adapts to target changes within each frame. We utilize a simple image-sequence contrastive loss as the loss function. Our approach not only simplifies the tracking framework, but also achieves state-of-the-art performance on multiple challenging datasets at real-time run speeds.
Loading