Abstract: Highlights•Exploiting spatial and temporal context for online tracking.•Extract similarity features from video frames using Transformer.•Simultaneous use of Transformer and feature pyramid for feature fusion.
Loading
OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2026 OpenReview