Point Evolution Hierarchy Network for Weak Single-Point Human Parsing

Published: 2025, Last Modified: 22 Jan 2026IEEE Trans. Image Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Sparsely single-point human parsing aims at segmenting the human body into fine-grained categories via weak point-level labels (e.g., point-level, scribble-level, or image-level, etc). The point-level label, especially single-point supervision, can simultaneously preserve spatial positions as well as take light annotation time, which is particularly advantageous in alleviating the human labeling burden. However, how to obtain satisfactory parsing performance under limited sparse point annotations is challenging, which requires further investigation. In this paper, we propose a novel end-to-end Point Evolution Hierarchy human parsing Network (PEHNet) for fine-grained human parsing task that just leverages single-point supervision. Motivated by the concept of a divide-and-conquer strategy, we partition all pixels into three distinct groups, i.e., single-point labels, pseudo-region labels, and unlabeled pixels, then optimize each group with suitable mechanisms. To expand the coverage of single-point labels, we introduce a point dissemination module that generates high-quality pseudo-region labels. Furthermore, the point-level spatial position information inherently preserves the structural characteristics of the human body. Inspired by this hierarchical property, we devise a point-level human hierarchy-wise constraint that guides the prediction probabilities to align with the inherent hierarchy of the human body. Experimental results demonstrate that the proposed PEHNet outperforms state-of-the-art parsing methods on two popular human parsing benchmark datasets (LIP and ATR) and one semantic segmentation dataset (Pascal VOC 2012).
Loading