Keywords: Monocular 3D Detection; Robot Companions; Human-Centered Automation; Surveillance Robotic Systems
TL;DR: We present a real-time, optimization-based method that enables a robot with a single camera to robustly locate a person despite severe ego-motion by simultaneously estimating the person's 3D position and the camera's 2D attitude.
Abstract: Robust person localization from a moving camera is a fundamental skill for robots to navigate and interact with humans in the open world. However, the diversity of robot platforms and environments poses a significant generalization challenge. Learning-based methods, often trained on datasets with limited camera motion, fail in out-of-distribution (OOD) scenarios involving severe camera ego-motion. To address this, we propose an optimization-based method that models the human with a four-point skeleton to jointly estimate camera attitude and 3D person location. Our approach avoids reliance on large-scale training data and generalizes across different viewpoints and image projections. Real-robot experiments and dataset evaluations show our method outperforms existing approaches, especially in these challenging OOD scenarios. The system is deployed for person-following on an agile quadruped, demonstrating its utility for robust open-world Human-Robot Interaction (HRI).
Supplementary Material: zip
Submission Number: 14
Loading