Consistent3DGen: Bridging Stochastic Generation and Deterministic Reconstruction for Image-to-3D Diffusion Models
Keywords: 3D Diffusion, Image-to-3D
Abstract: Recent large-scale 3D diffusion models have achieved remarkable success in generating high-quality and detailed 3D objects. However, due to their reliance on randomized initial noise sampling, these models often produce 3D objects that, while visually similar to input images, lack precise consistency with them. We attribute this limitation to the fundamental tension between generative modeling and faithful reconstruction. We argue that the image-to-3D task should be the combination of reconstruction at known views and completion at unknown views. To address this challenge, we propose Consistent3DGen, a training-free framework that ensures consistency for existing 3D diffusion models. Our approach leverages state-of-the-art pixel-aligned point cloud reconstruction algorithms, such as VGGT, to obtain geometrically consistent 3D point clouds from input images. We then introduce a mechanism to map these front-facing point clouds into the VAE latent space of 3D diffusion models, and design a novel algorithm for completing the back part by front-partial denoising guidance. Extensive experiments demonstrate that our method achieves high consistency in the face-forward direction of 3D models, especially in situations where consistency is required, e.g., characters.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 2917
Loading