Abstract: Existing visual trackers based on Siamese region proposal networks determine the target location and size by comparing features of the target template and search regions. However, due to changes in scale, and appearance, inconsistencies in features may arise, thereby affecting tracking performance. To address this issue, we innovatively introduce a feature interaction alignment network to learn feature interactions between the target template and search regions. This enables the tracker to better align the target-specific features in complex scenarios. Through feature interaction, the tracker can perform feature fusion at a more refined level. Furthermore, to enable existing trackers to learn feature interactions, we improve the loss function to better accommodate the needs of feature interaction alignment. The optimized loss function more effectively balances different types of prediction errors, enhancing the model’s adaptability to complex scenarios. Experimental results on four public benchmark datasets show that the proposed feature interaction alignment network improves the accuracy and robustness of the baseline trackers, providing a new direction for improving existing visual tracking methods based on Siamese region proposal networks.
External IDs:dblp:journals/spl/LiuG25
Loading