Abstract: Active Visual Tracking (AVT) is a significant research area with extensive applications in fields such as drones and autonomous driving. AVT involves controlling camera motion based on visual observations to track target object(s). In dynamic environments, especially with the presence of distractors, AVT faces the challenge of scale variation. Existing methods struggle to effectively handle these scale changes. To address this problem, this paper proposes a novel Scale Variation Robust Active Visual Tracking method (SVR-AVT). We first introduce a multi-scale multi-stage curriculum learning approach. By progressively increasing the complexity of tracking tasks, the tracker adapts to target of various scales. Secondly, we design a scale attention network, which adaptively extracts important scale features through multiple convolutional branches with different receptive fields and a scale attention mechanism. Moreover, we employ maximum position entropy learning to encourage the target to explore the environment more extensively. Experimental results in 3D environments demonstrate that SVR-AVT significantly outperforms existing methods in handling distraction and scale variation, and exhibits strong generalization capability in unseen environments.
Loading