Abstract: Modeling humans in physical scenes is vital for understanding human-environment
interactions for applications involving augmented reality or assessment of human actions
from video (e.g. sports or physical rehabilitation). State-of-the-art literature begins with
a 3D human pose, from monocular or multiple views, and uses this representation to
ground the person within a 3D world space. While standard metrics for accuracy capture joint position errors, they do not consider physical plausibility of the 3D pose. This
limitation has motivated researchers to propose other metrics evaluating jitter, floor penetration, and unbalanced postures. Yet, these approaches measure independent instances
of errors and are not representative of balance or stability during motion. In this work,
we propose measuring physical plausibility from within physics simulation. We introduce two metrics to capture the physical plausibility and stability of predicted 3D poses
from any 3D Human Pose Estimation model. Using physics simulation, we discover
correlations with existing plausibility metrics and measuring stability during motion. We
evaluate and compare the performances of two state-of-the-art methods, a multi-view
triangulated baseline, and ground truth 3D markers from the Human3.6m dataset.
Loading