CLAT: Convolutional Local Attention Tracker for Real-time UAV Target Tracking System with Feedback Information
Abstract: Real-time UAV vision target tracking systems encounter the intricate challenges of striking a trade-off for tracking speed and performance, and the robustness of the following control. In existing tracking systems, the global attention mechanism enhances tracking performance, but it introduces higher computational complexity, impacting target tracking speed; the local attention mechanism can reduce computational complexity but often exhibits limitations in modeling the receptive field. In this paper, we propose a new framework named Convolutional Local Attention Tracker (CLAT) to address these challenges. Firstly, we design a hierarchical convolutional local attention structure as the feature extractor for CLAT. This leverages convolutional projection before local window partitioning, facilitating connections between non-overlapping windows and expanding the receptive field. Secondly, we introduce a streamlined feature fusion network comprising the unshared-weights convolutional layer and a global attention network. The whole design can balance speed and accuracy. Furthermore, to enhance servo control robustness, we have redesigned the upper-level controller by integrating all bounding box information. To capture feedback spatiotemporal information in CLAT, a dynamic template update is implemented by incorporating an IOU head into the predictor. Extensive experiments on visual tracking benchmarks and in the real world demonstrate that CLAT achieves competitive performance. Moreover, we have developed a comprehensive tracking system demonstration capable of precisely tracking targets across various categories. The tracker code will be released on https://github.com/xiaolousun/refine-pytracking.git.
Loading