Enhanced Semantic Alignment in Transformer Tracking via Position Learning and Force-Directed Attention

ICLR 2025 Conference Submission1288 Authors

17 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer Tracking; Single Object Tracking; Semantic Alignment; Self-supervised Position Loss; Force-Directed Attention
Abstract: In the field of visual object tracking, one-stream pipelines have become the mainstream framework due to its efficient integration of feature extraction and relationship modeling. However, existing methods still face the issue of semantic misalignment: firstly, the interaction of positional encoding between the two branches leads to a misalignment between feature semantics and position encoding; secondly, traditional attention mechanisms fail to distinguish between semantic attraction and repulsion among features, resulting in semantic misalignment when the model processes these features. To address these issues, we propose an Enhanced Semantic Alignment Transformer Tracker (ESAT) with position encode learning and force-directed attention mechanism. By leveraging positional encoding loss, ESAT separately learns the absolute positional encodings of the target and search branches, distinguishing the locations of various tokens and their positive or negative relationships, thereby enhancing the semantic consistency between position and features. Additionally, it incorporates a repulsion-attraction mechanism applied to the self-attention module, simulating dynamic interactions between nodes to improve feature discrimination. Extensive experiments on multiple public tracking datasets show that our method outperforms many pipelines and achieves superior performance on five challenging benchmarks.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1288
Loading