Real-Time Reinforcement Learning for Optimal Viewpoint Selection in Monocular 3D Human Pose Estimation

Published: 01 Jan 2024, Last Modified: 05 Apr 2025IEEE Access 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Monocular 3D human pose estimation (HPE) presents an inherently ill-posed challenge, complicated by issues such as depth ambiguity and uncertainty. Estimating 3D poses with a single camera heavily depends on viewpoint, resulting in poor pose estimation accuracy. To address these challenges, we propose a real-time reinforcement learning-based viewpoint selection method that dynamically adjusts the camera viewpoint to optimize pose estimation. Our method extracts features encoding depth ambiguity and uncertainty from 2D-to-3D lifting, allowing the model to identify the optimal camera movements without requiring multiple cameras. We evaluate our approach on a publicly available real-world dataset, adjusted to simulate a realistic setting of drone flights capturing human motions. Our approach, compared against baseline strategies including fixed, random, and rotating camera movements with various 3D HPE models, significantly enhances the accuracy and robustness of pose estimation. In particular, it achieves a notable improvement, reducing pose estimation errors by over 30% compared to fixed and random camera movements. These results highlight the effectiveness of our method in optimizing viewpoint selection for real-time 3D HPE, making it a practical solution for single-camera setups in dynamic environments. Our code is available at https://github.com/knu-vis/nbv-pose.
Loading