Human Interaction-Aware 3D Reconstruction from a Single Image
Keywords: textured 3D human models, multi-human scenes, geometric distortions, occlusions and proximity
Abstract: Reconstructing textured 3D human models from a single image is fundamental for AR/VR and digital human applications. However, existing methods mostly focus on single individuals and thus fail in multi-human scenes, where naive composition leads to artifacts such as unrealistic overlaps, missing geometry in occluded regions, and distorted interactions. To address this, we propose HUG3D, a holistic framework that explicitly models both group- and instance-level information. To mitigate perspective-induced geometric distortions, we first transform the input into a canonical orthographic space. Then Human Group-Instance Multi-View Diffusion (HUG-MVD) generates complete multi-view normals and images by jointly modeling individuals and group context. Subsequently, the Human Group-Instance Geometric Reconstruction (HUG-GR) module optimizes the geometry by leveraging physics-based interaction priors to accurately model inter-human contact, followed by high-fidelity texture reconstruction. Extensive experiments show that HUG3D significantly outperforms prior methods, producing physically plausible, high-fidelity 3D reconstructions of interacting people from a single image.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 7
Loading