Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation

Xianghui Xie, Bharat Lal Bhatnagar, Jan Eric Lenssen, Gerard Pons-Moll

Published: 2024, Last Modified: 24 Mar 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Reconstructing human-object interaction in 3D from a single RGB image is a challenging task and existing data driven methods do not generalize beyond the objects present in the carefully curated 3D interaction datasets. Capturing large-scale real data to learn strong interaction and 3D shape priors is very expensive due to the combinato-rial nature of human-object interactions. In this paper, we propose ProciGen (Procedural interaction Generation), a method to procedurally generate datasets with both, plau-sible interaction and diverse object variation. We gener-ate 1M+ human-object interaction pairs in 3D and lever-age this large-scale data to train our HDM (Hierarchical Diffusion Model), a novel method to reconstruct interacting human and unseen object instances, without any tem-plates. Our HDM is an image-conditioned diffusion model that learns both realistic interaction and highly accurate human and object shapes. Experiments show that our HDM trained with ProciGen significantly outperforms prior meth-ods that require template meshes, and our dataset allows training methods with strong generalization ability to un-seen object instances. Our code and data are released.