Keywords: 3D; Video Diffusion Model; 3D generation
Abstract: Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results.We introduce a unified 3D generation framework, named SC3D, which integrates diffusion-based multi-view image generation and 3D reconstruction through a self-conditioning mechanism. In our framework, these two modules are established as a cyclic relationship so that they adapt to the distribution of each other.During the denoising process of multi-view generation, we feed rendered color images and maps by SC3D itself to the multi-view generation module.
This self-conditioned method with 3D aware feedback unites the entire process and improves geometric consistency.Experiments show that our approach enhances sampling quality, and improves the efficiency and output quality of the generation process.
Supplementary Material: zip
Primary Area: Diffusion based models
Submission Number: 2818
Loading