Evaluation of Accuracy and Angle Dependency of 3D Pose Estimation through Stereo Camera Information Fusion with MediaPipe Pose

Published: 01 Jan 2024, Last Modified: 27 Feb 2025FUSION 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, significant research has been conducted on video-based human pose estimation (HPE). While monocular 2D HPE has been shown to achieve high performance, monocular 3D HPE is more challenging. Fusing the advantages of high accuracy in 2D HPE with the increased usability of 3D coordinates, we propose a method based on MediaPipe Pose 2D HPE on stereo cameras, epipolar geometry and direct triangulation to reconstruct 3D poses. We use the CMU Panoptic database, which provides recordings of humans from 31 different HD views and 3D ground truth data, to research which accuracy can be achieved from fusing only two cameras without prior stereo calibration. We also research which camera perspectives to employ, analyzing the angle dependency of our approach.
Loading