Abstract: Efficient UAV tracking faces challenges due to limited onboard resources, short battery life, and payload constraints. Discriminative correlation filters (DCF)-based trackers have traditionally been preferred for their efficiency. However, recent lightweight deep learning (DL)-based trackers, leveraging model compression, exhibit significant CPU efficiency and precision. Unfortunately, existing model compression approaches have struggled with maintaining tracking precision, especially at higher compression rates. This paper addresses this limitation with a novel approach, called disentangled representation learning with mutual information maximization (DR-MIM). DR-MIM has the potential to enhance the accuracy and efficiency of DL-based trackers, particularly in UAV tracking. The primary concept underpinning this approach is disentangled representation which bifurcates the feature space into two distinct categories: identity-related and identity-unrelated elements. This bifurcation serves as a critical component, powering the heightened level of precision and efficiency attained by DL-based trackers. The focus of this work is predominantly on the employment of exclusively identity-related features. As a result of this focus, there’s a noticeable escalation in the efficacy of the feature representation. To further enhance efficiency without compromising accuracy, the network has been quantified and named DR-MIM (v2). Exhaustive experiments conducted across four aerial tracking benchmarks have demonstrated the superiority of the proposed method. A standout performance on VisDrone2018 showcases DR-MIM (v2)’s impressive GPU speed of 658 FPS, coupled with a maintained precision of 82.1%, underscoring the potential of proposed approach in real-world UAV tracking scenarios.
Loading