Abstract: In recent years, trackers based on neural networks have demonstrated excellent tracking performance. Compared to general tracking tasks, aerial tracking tasks require more stringent operational efficiency of the tracker. Furthermore, since aerial devices such as unmanned aerial vehicles usually shoot from a high altitude, the tracked target occupies fewer pixels, resulting in scarce target discrimination information and more susceptibility to interference from cluttered backgrounds. Unfortunately, existing trackers usually model the entire template region nondifferently, which tends to confuse target and background information in the template and reduce the robustness of the tracker in complex scenes. Additionally, most tracking networks refuse or only refer to a single feature layer to learn the risky temporal context information, which makes it difficult to accurately supplement the scarce target discrimination information. To this end, we propose an efficient template distinction modeling tracker with temporal contexts, called ETDMT, designed to improve the complex scene robustness of aerial tracking through a combination of template distinction modeling and temporal context analysis. The tracker employs a template distinction modeling transformer network, which distinguishes between target and background elements and adopts different modeling approaches for different elements to alleviate the background interference problem prevalent in aerial tracking. Then, the temporal contexts of the tracker are complemented by a global-local spatial awareness update module, which enhances the tracker’s understanding of the latest target state by comprehensively evaluating and adjusting dynamic templates. Extensive experiments demonstrate that the proposed ETDMT achieves advanced aerial tracking performance and efficient running speed.
Loading