Spreading Fine-Grained Prior Knowledge for Accurate Tracking

Jiahao Nie, Han Wu, Zhiwei He, Mingyu Gao, Zhekang Dong

Published: 01 Jan 2022, Last Modified: 16 Nov 2023IEEE Trans. Circuits Syst. Video Technol. 2022Readers: Everyone

Abstract: With the widespread use of deep learning in single object tracking task, mainstream tracking algorithms treat tracking as a combined classification and regression problem. Classification aims at locating an arbitrary target, and regression aims at estimating the corresponding bounding box. In this paper, we focus on regression and propose a novel box estimation network, which consists of a transformer encoder target pyramid guide (TPG) and transformer decoder target pyramid spread (TPS). Specifically, the transformer encoder TPG is designed to generate fine-grained prior knowledge with explicit representation for template targets. In contrast to the raw transformer encoder, we capture the visual dependence through local-global self-attention and deem the multi-scale target regions as the “local” region. Using this fine-grained prior knowledge, we design the transformer decoder TPS to spread it to the subsequent search regions with high affinity to accurately estimate the bounding boxes. Considering that self-attention fails to model information interaction across channels between the template target and search regions, we develop a channel-wise cross-attention block within the TPS as compensation. Extensive experiments on the OTB100, UAV123, NFS, VOT2020, VOT2021, LaSOT, LaSOT_ext, TrackingNet and GOT-10k benchmarks show that the proposed box estimation network outperforms most existing box estimation methods. Furthermore, our trackers based on this estimation network exhibit a competitive performance against state-of-the-art trackers.

0 Replies