ECO++: Adaptive deep feature fusion target tracking method in complex scene

Yuhan Liu, He Yan, Qilie Liu, Wei Zhang, Junbin Huang

Published: 2024, Last Modified: 13 Apr 2025Digit. Commun. Networks 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Efficient Convolution Operator (ECO) algorithms have achieved impressive performances in visual tracking. However, its feature extraction network of ECO is unconducive for capturing the correlation features of occluded and blurred targets between long-range complex scene frames. More so, its fixed weight fusion strategy does not use the complementary properties of deep and shallow features. In this paper, we propose a new target tracking method, namely ECO++, using deep feature adaptive fusion in a complex scene, in the following two aspects: First, we constructed a new temporal convolution mode and used it to replace the underlying convolution layer in Conformer network to obtain an improved Conformer network. Second, we adaptively fuse the deep features, which output through the improved Conformer network, by combining the Peak to Sidelobe Ratio (PSR), frame smoothness scores and adaptive adjustment weight. Extensive experiments on the OTB-2013, OTB-2015, UAV123, and VOT2019 benchmarks demonstrate that the proposed approach outperforms the state-of-the-art algorithms in tracking accuracy and robustness in complex scenes with occluded, blurred, and fast-moving targets.