Monocular Person Localization under Camera Ego-Motion

Yu Zhan; Hanjing Ye; Hong Zhang

Monocular Person Localization under Camera Ego-Motion

Yu Zhan, Hanjing Ye, Hong Zhang

Published: 06 Oct 2025, Last Modified: 15 Oct 2025OWN 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Monocular 3D Detection; Robot Companions; Human-Centered Automation; Surveillance Robotic Systems

TL;DR: We present a real-time, optimization-based method that enables a robot with a single camera to robustly locate a person despite severe ego-motion by simultaneously estimating the person's 3D position and the camera's 2D attitude.

Abstract: Robust person localization from a moving camera is a fundamental skill for robots to navigate and interact with humans in the open world. However, the diversity of robot platforms and environments poses a significant generalization challenge. Learning-based methods, often trained on datasets with limited camera motion, fail in out-of-distribution (OOD) scenarios involving severe camera ego-motion. To address this, we propose an optimization-based method that models the human with a four-point skeleton to jointly estimate camera attitude and 3D person location. Our approach avoids reliance on large-scale training data and generalizes across different viewpoints and image projections. Real-robot experiments and dataset evaluations show our method outperforms existing approaches, especially in these challenging OOD scenarios. The system is deployed for person-following on an agile quadruped, demonstrating its utility for robust open-world Human-Robot Interaction (HRI).

Supplementary Material: zip

Submission Number: 14

Loading