Abstract: Current methods of multiperson pose estimation
(MPPE) typically treat the human detection and association of
joints separately. They introduce complex hand-crafted poseprocesses
like RoI cropping, NMS and grouping or rely on dense
representations to preserve the spatial features. In this article,
we dive a deeper thought into this task and propose a simpler
and effective framework, termed SparsePose, which can directly
predict multiperson joint coordinates from the full image without
any post-processes and dense representations. In SparsePose,
the full-body instances are decoupled by exploring spatialaware
feature learning (SFL) without box and classification
supervision. For improving the quality of instance map, the
instance contrastive constraint (ICC) and center correction (CC)
strategy are proposed to make the instance-wise spatial feature
more discriminative. Importantly, we propose a visibility-guided
weighting mechanism to enable model be confident to the visible
joint predictions and insensitive to the occlusions or partial
bodies. In general, SparsePose is conceptually simpler and plays
favorably against the existing counterparts on three benchmarks
in terms of both accuracy and efficiency.
Loading