Accurate visual representation learning for single object tracking

Hua Bao, Ping Shu, Qijun Wang

Published: 2022, Last Modified: 11 May 2023Multim. Tools Appl. 2022Readers: Everyone

Abstract: As a fundamental visual task, single object tracking has witnessed astonishing improvements. However, there still existing many factors should be to addressed for accurately tracking performance. Among them, visual representation is one of important influencers suffer from complex appearance changes. In this work, we propose a rich appearance representation learning strategy for tracking. First, by embedding the saliency feature extractor module, we try to improve the visual representation ability by fusing the saliency information learning from different convolution lays. With leveraging lightweight Convolutional Neural Network VGG-M as the features extractor backbone, we can attain robust appearance model by deep features with fruitful semantic information. Second, as for the classifier has significant complementary guidance for location prediction, we propose to generate diverse feature instances of the target by introducing the adversarial learning strategy. Given the generated diverse instances, many complex situations in the tracking process can be effectively simulated, especially the occlusion that conformed to the long tail distribution. Third, to optimize the bounding boxes refinement, we employ a precise pooling strategy for attaining feature maps with high resolution. Then, our approach can capture the subtle appearance changes effectively over a long time range. Finally, extensive experiments was conducted on several benchmark datasets, the results demonstrate that the proposed approach performs favorably against many state-of-the-art algorithms.

0 Replies