Multi-hypothesis representation learning for transformer-based 3D human pose estimation

Published: 01 Jan 2023, Last Modified: 13 Nov 2024Pattern Recognit. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We present a new Transformer-based method, called Multi-Hypothesis Transformer (MHFormer++), for 3D human pose estimation from monocular videos. It builds a one-to-many-to-one framework, which can effectively learn spatiotemporal representations of multiple pose hypotheses in an end-to-end manner.•A Multi-Hypothesis Generation (MHG) module is designed to capture both global and local information of human body joints within each frame and generate multiple hypothesis representations containing diverse semantic information in the spatial domain•A Self-Hypothesis Refinement (SHR) module and a Cross-Hypothesis Interaction (CHI) module are introduced to model temporal consistencies across frames and communicate among multi-hypothesis features both independently and mutually in the temporal domain•The proposed method achieves state-of-the-art performance on two challenging 3D human pose estimation benchmark datasets. Highlights (for review)
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview