HMD^2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device

Published: 23 Mar 2025, Last Modified: 24 Mar 20253DV 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Motion Estimation, Motion Generation, Ego-centric, Multi-modal, Head-mounted device
TL;DR: We propose HMD^2, the first system for the online generation of full-body self-motion using a single head-mounted device (e.g. Project Aria Glasses) equipped with an outward-facing camera in complex and diverse environments.
Abstract: This paper investigates the generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM. To address the ambiguity of this setup, we present HMD$^2$, a novel system that balances motion reconstruction and generation. From a reconstruction standpoint, it aims to maximally utilize the camera streams to produce both analytical and learned features, including head motion, SLAM point cloud, and image embeddings. On the generative front, HMD$^2$ employs a multi-modal conditional motion diffusion model with a Transformer backbone to maintain temporal coherence of generated motions, and utilizes autoregressive inpainting to facilitate online motion inference with minimal latency (0.17 seconds). We show that our system provides an effective and robust solution that scales to a diverse dataset of over 200 hours of motion in complex indoor and outdoor environments.
Supplementary Material: pdf
Submission Number: 38
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview