Consistent3DGen: Bridging Stochastic Generation and Deterministic Reconstruction for Image-to-3D Diffusion Models

Lutao Jiang; Yuanhuiyi Lyu; Zidong Cao; Zixin Zhang; Kanghao Chen; Xu Zheng; LI JING; Xuming Hu; Ying-Cong Chen

Consistent3DGen: Bridging Stochastic Generation and Deterministic Reconstruction for Image-to-3D Diffusion Models

Lutao Jiang, Yuanhuiyi Lyu, Zidong Cao, Zixin Zhang, Kanghao Chen, Xu Zheng, LI JING, Xuming Hu, Ying-Cong Chen

08 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Diffusion, Image-to-3D

Abstract: Recent large-scale 3D diffusion models have achieved remarkable success in generating high-quality and detailed 3D objects. However, due to their reliance on randomized initial noise sampling, these models often produce 3D objects that, while visually similar to input images, lack precise consistency with them. We attribute this limitation to the fundamental tension between generative modeling and faithful reconstruction. We argue that the image-to-3D task should be the combination of reconstruction at known views and completion at unknown views. To address this challenge, we propose Consistent3DGen, a training-free framework that ensures consistency for existing 3D diffusion models. Our approach leverages state-of-the-art pixel-aligned point cloud reconstruction algorithms, such as VGGT, to obtain geometrically consistent 3D point clouds from input images. We then introduce a mechanism to map these front-facing point clouds into the VAE latent space of 3D diffusion models, and design a novel algorithm for completing the back part by front-partial denoising guidance. Extensive experiments demonstrate that our method achieves high consistency in the face-forward direction of 3D models, especially in situations where consistency is required, e.g., characters.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 2917

Loading