Identity-Preserving Human Reconstruction from a Single Image via Explicit 3D Reasoning

Yanqi Bao; Jiaxiang Shang; Yang Gao; Yingchun Liu; Jing Huo; Jing Liao

Identity-Preserving Human Reconstruction from a Single Image via Explicit 3D Reasoning

Yanqi Bao, Jiaxiang Shang, Yang Gao, Yingchun Liu, Jing Huo, Jing Liao

19 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Identity-Preserving Human Reconstruction from a Single Image via 3D Token Reasoning

TL;DR: We present the Identity-Preserving Large Human Reconstruction Model (IPRM), a feed-forward framework that reconstructs photorealistic, clothed 3D humans from a single in-the-wild image while explicitly preserving 3D identity features.

Abstract: We present the Identity-Preserving Large Human Reconstruction Model (IPRM), a feed-forward framework that reconstructs photorealistic, clothed 3D humans from a single in-the-wild image while preserving 3D identity. Recent works predominantly reason 3D structure based on 2D features, making it challenging to achieve 3D consistency while preserving the human identity in 3D space. To alleviate these challenges, IPRM anchors the monocular 3D reasoning human reconstruction by constructing a human-based 3D feature space and explicitly preserves the human identity and details by the 3D features. Specifically, we introduce an efficient and robust SMPL-based sparse voxel representation to transform 2D identity features into 3D space, categorizing them as 3D visible identity tokens and invisible tokens to be reasoned. Using these 3D tokens, an identity-aware 3D reasoning module is proposed to propagate projected 3D identity features from visible to invisible tokens, ensuring that only unobserved regions are reasoned while observed identity remains intact. Subsequently, IPRM introduces an encoder-decoder structure to decode SMPL-based 3D features into 3DGS and mesh representation, while simultaneously designing a 3D ID Adapter for identity preservation. Instead of only conditioning on 2D image tokens, this adapter utilizes 3D identity tokens extracted from a single-view branch as guidance to inject identity information at the 3D token level. Comprehensive experiments on existing benchmarks and in-the-wild data show that IPRM surpasses state-of-the-art methods in reconstruction performance, efficiency, and identity consistency.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 17377

Loading