Pose Sequence Model Using the Encoder-Decoder Structure for 3D Pose Estimation

Published: 01 Jan 2022, Last Modified: 29 Oct 2024DMBD (1) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Human pose estimation is a hot research problem in computer vision, it has a certain application prospect in the automatic driving industry, security field, film and television industry, and specific action monitoring of special scenes. Because a 2D skeleton usually corresponds to multiple 3D skeletons, the mapping from 2D to 3D in the monocular video has inherent depth ambiguity and is ill-posed, which makes the research on the technology of 3D human pose estimation in monocular video challenging. In this paper, a Pose Sequence Model (PSM) for 3D human pose estimation in the monocular video is proposed, which combines the full convolution neural network based on extended convolution with the Long Short-Term Memory (LSTM) network. We make full use of convolution to extract spatial features and use LSTM to obtain temporal features. With this model, we can predict 3D human posture through 2D sequences. Compared with the previous work on classical data sets, our method has good detection results.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview