Local Fine-Grained Visual Tracking

Published: 2025, Last Modified: 20 Jan 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces a novel local fine-grained visual tracking task, aiming to precisely locate arbitrary local parts of objects. This task is motivated by our observation that in many realistic scenarios, the user demands to track a local part instead of a holistic object. However, the absence of an evaluation dataset and the distinctive characteristics of local fine-grained targets present extra challenges in conducting this research. To tackle these issues, first, this paper constructs a local fine-grained tracking (LFT) dataset to evaluate the tracking performance for local fine-grained targets. Second, this paper designs a cutting-edge solution to handle the challenges posed by properties of local objects, including ambiguity and high-proportion backgrounds. It consists of a hierarchical adaptive mask mechanism and foreground-background differentiated learning. The former adaptively searches for and masks ambiguity, which drives the network to concentrate on the local target instead of the holistic objects. The latter is constructed to distinguish foreground and background in an unsupervised manner, which is beneficial to mitigate the impacts of high-proportion backgrounds. Extensive analytic experiments are performed to verify the effectiveness of each submodule in the proposed fine-grained tracker.
Loading