Abstract: Obtaining a trade-off between accuracy and efficiency for a convolutional neural network is highly desired in the deep classification-based trackers. However, it is observed that existing methods make the predictions with the latest exits strategy for all the samples, making such strategy a time-consuming solution. Motivated by this, we propose a multi-exit architecture based on the principle of knowledge distillation to improve the speed of prediction by encouraging early exits to imitate later and more accurate exits. Specifically, we propose a distillation-based multi-exit fully convolutional network (FCN), named DMENet, for visual tracking. In DMENet, different types of attention mechanisms are embedded into different representation levels of FCN to capture more discriminative information. Then, three exits augment at different levels of FCN to handle the processing of a frame to stop early. The DMENet is trained offline with knowledge distillation to improve the accuracy of early exits. The confidence score of an exit is utilized to decide whether to locate the target with high confidence on this exit or continue processing the next exit. The extensive evaluation performed on OTB-100, UAV123, LaSOT and VOT2018 benchmarks demonstrate the proposed tracker outperforms state-of-the-art approaches with a high speed (36 FPS).
0 Replies
Loading