- Keywords: Human Instance Segmentation, Prior Knowledge, Self-supervised
- TL;DR: A self-supervised framework that leverages prior knowledge about video data to improve the performance of human instance segmentation without requiring additional annotations
- Abstract: Existing approaches for human-centered tasks such as human instance segmentation are focused on improving the architectures of models, leveraging weak supervision or transforming supervision among related tasks. Nonetheless, the structures are highly specific and the weak supervision is limited by available priors or number of related tasks. In this paper, we present a novel self-supervised framework for human instance segmentation. The framework includes one module which iteratively conducts mutual refinement between segmentation and optical flow estimation, and the other module which iteratively refines pose estimations by exploring the prior knowledge about the consistency in human graph structures from consecutive frames. The results of the proposed framework are employed for fine-tuning segmentation networks in a feedback fashion. Experimental results on the OCHuman and COCOPersons datasets demonstrate that the self-supervised framework achieves current state-of-the-art performance against existing models on the challenging datasets without requiring additional labels. Unlablled video data is utilized together with prior knowledge to significantly improve performance and reduce the reliance on annotations.