Multifrequency Integration and Scale-Frequency Linear Attention for Aerial Tracking

Published: 01 Jan 2025, Last Modified: 07 Sept 2025IEEE Trans. Instrum. Meas. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Aerial tracking is an essential component of vision-based measurement and unmanned aerial vehicle (UAV) systems, playing a crucial role in autonomous navigation, intelligent transportation, and remote sensing. However, aerial scenarios present unique challenges such as rapid motion, scale variation, and camera movement. Most existing tracking methods either use convolutional neural networks (CNNs) for feature extraction or rely on Transformers for global feature modeling, but these approaches struggle to balance accuracy and real-time performance. To address this issue, this article proposes MISATrack, an efficient Siamese network model that integrates a frequency-domain encoder (FDE) and scale-frequency linear attention (SFLA) mechanism. First, we apply the discrete wavelet transform (DWT) to convert the input image into the frequency domain and then enhance its diverse visual information by our multifrequency integration strategy. In addition, we further design the SFLA module for hierarchical feature fusion. This scheme greatly facilitates representation learning with few computational overhead, resulting in more robust object tracking. The proposed tracker is evaluated using four UAV tracking benchmarks: DTB70, UAV123, UAV123@10fps, and UAVTrack112. The experimental results indicate that MISATrack outperforms most state-of-the-art trackers while maintaining real-time tracking. The code is publicly available at https://github.com/Wang123z/MISATrack
Loading