Keywords: Contact, Scene, Human-Scene interaction, Human, Pose, 3D Vision, Dataset
Abstract: In order to develop vision-based methods for understanding how people interact with their physical environment, we introduce a multi-view video and body contact sensor dataset designed to capture dynamic human activities that involve interactions with the physical environment. The dataset includes activities such as parkour, physical training, and gym exercises, characterized by frequent body-environment contact. The proposed dataset includes 780K images across 130K pose sequences from 7 subjects. Each subject is captured by 6 synchronized third-person cameras, a single egocentric camera, and multiple contact sensors worn on the body. Using our proposed dataset, we benchmark state-of-the-art vision-based body contact models and show significant limitations in exiting methods. Furthermore, we benchmark existing human pose estimation methods on our dataset and show that they fail under significant occlusion caused by close interactions with the environment, which indicates that our dataset can also be used to further develop pose estimation models to be more robust during interaction with the environment. To facilitate better human pose estimation from video, we introduce and evaluate a video-based human contact detection model that outperforms existing image-based methods, underscoring the potential improvements from integrating contact information into pose estimation models. Code and data will be publicly available.
Supplementary Material: pdf
Submission Number: 291
Loading