Abstract: Visual object tracking(VOT) is a popular and fundamental task in the field of computer vision, which aim to locate the target object in a video sequence. VOT plays a crucial part in various applications, such as unmanned aerial vehicle (UAV), autonomous driving, surveillance, and human-computer interaction, etc. However, when applying visual object tracking in real-world scenarios, numerous challenges arise due to environmental factors. These challenges include cluttered backgrounds, object deformations, illumination variation, and more. Therefore, addressing these challenges is crucial for modern visual object trackers. In recent years, VOT algorithms based on Siamese neural networks have achieve a great tradeoff between tracking performance and computational complexity, making it popular frameworks for visual object tracking. Siamese-based tracks rely on comparing similarity between a template image of target and the of the search region image to achieve object tracking. However, these Siamese-based trackers relying on similarity matching may lead to false tracking duo to its insufficient discriminative ability in challenging scenarios such as cluttered backgrounds. This work proposes a Siamese-based tracker with masked attention mechanism, which aims to enhance the target features representation and improve the discriminative ability of Siamese-based tracker when dealing with cluttered backgrounds. Finally, we evaluated the performance of the proposed Siamese-based visual object tracker using the OTB100 testing dataset. The experimental results demonstrated improved performance in challenging scenes such cluttered backgrounds scenario.
Loading