Abstract: The utilization of temporal sequences is crucial for tracking in complex scenarios, particularly when addressing challenges such as occlusion and deformation. However, existing methods are often constrained by limitations such as the use of unrefined raw images or computationally expensive temporal fusion modules, both of which restrict the scale of temporal sequences that can be utilized. This study proposes a novel appearance compression strategy and a temporal feature fusion module, which together significantly enhance the tracker’s ability to utilize long-term temporal sequences. Based on these designs, we propose a tracker that can leverage a Long-term Temporal Sequence that contains historical context across 300 frames, which we name LTSTrack. First, we present a simple yet effective appearance compression strategy to extract target appearance features from each frame and compress them into …
Loading