Abstract: Visual face tracking is one of the most important components for face analysis in mobile applications and in video surveillance systems. In this paper, we propose an efficient face tracker called FT-RCNN, short for Face Tracking with Region-based CNN, that is based on the Faster-RCNN framework. A simple yet effective tracking branch is proposed to enable the framework to jointly perform face detection and tracking. To address the problem of insufficient training data for face tracking, we propose a novel pairwise training strategy that enables us to train face tracker by leveraging existing face detection datasets, thus eliminating the need to collect and annotate video data specifically for face tracking. Furthermore, we devise a novel loss function, termed Pair-hard Triplet Cosine Loss, that employs a pair-hard triplet mining strategy to increase the discriminative power of our face tracker. We evaluated the proposed tracker on popular video face datasets including MobiFace, ChokePoint and YouTube Face. The results have shown that FT-RCNN outperforms state-of-the-art face trackers and runs at real-time speed. We plan to release the source code for reproducible results in the future.
0 Replies
Loading