Siamese Network Based on MLP and Multi-head Cross Attention for Visual Object Tracking

Published: 01 Jan 2023, Last Modified: 16 May 2025ICANN (10) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Visual object tracking is an important prerequisite in many applications. However, the performance of the tracking system is often affected by the quality of the visual object’s feature representation and whether it can identify the best match of the target template in the search area. To alleviate these challenges, we propose a new method based on Multi-Layer Perceptron (MLP) and multi-head cross attention. First, a new MLP-based module is designed to enhance the input features, by refining the internal association between the spatial and channel dimensions of these features. Second, an improved head network is constructed for predicting the location of the target, in which the multi-head cross attention mechanism is used to find the optimal matching between the template and the search area. Experiments on four datasets show that the proposed method offers competitive tracking performance as compared with several recent baseline methods. The codes will be available at https://github.com/SYLan2019/MLP-MHCA.
Loading