Multi-View 2D to 3D Lifting Video-Based Optimization: A Robust Approach for Human Pose Estimation with Occluded Joint Prediction

Daniela Rato, Miguel Oliveira, Vítor M. F. Santos, Angel Domingo Sappa, Bogdan Raducanu

Published: 01 Jan 2024, Last Modified: 25 Jul 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the context of robotics, accurate 3D human pose estimation is essential for enhancing human-robot collaboration and interaction. This manuscript introduces a multi-view 2D to 3D lifting optimization-based method designed for video-based 3D human pose estimation, incorporating temporal information. Our technique addresses key challenges, namely robustness to 2D joint detection error, occlusions, and varying camera perspectives. We evaluate the performance of the algorithm through extensive experiments on the MPI-INF-3DHP dataset. Our method demonstrates very good robustness up to 25 pixels of 2D joint error and shows resilience in scenarios involving several occluded joints. Comparative analyses against existing 2D to 3D lifting and multi-view methods showcase good performance of our approach.