Abstract: Existing Siamese-based trackers divide visual tracking into two stages, i.e., feature extraction (backbone subnetwork), and prediction (head subnetwork). However, they mainly implement task-level supervision (classification and regression), barely considering the feature-level supervision in the knowledge learning process, which could result in deficient knowledge interaction among the features of the tracker’s targets and background interference during the online tracking process. To solve the issues, this paper proposes an educational pattern-guided self-knowledge distillation methodology by guiding Siamese-based trackers to learn feature knowledge by themselves, which can serve as a generic training protocol to improve any Siamese-based tracker. Our key insight is to utilize two educational self-distillation patterns, i.e., focal self-distillation and discriminative self-distillation, to educate the tracker to possess self-learning ability. The focal self-distillation pattern educates the tracking network to focus on valuable pixels and channels by decoupling the spatial learning and channel learning of target features. The discriminative self-distillation pattern aims at maximizing the discrimination between foreground and background features, ensuring that the trackers are unaffected by background pixels. As one of the first attempts to introduce self-knowledge distillation into the visual tracking field, our method is effective and efficient and has a strong generalization ability, which might be instructive for other research. Codes and data are publicly available.
Loading