Abstract: We demonstrate an automatic 3D creation system, which can create realistic 3D assets solely from a text or image prompt without requiring any specialized 3D modeling skills. Users can either describe the object they envision in natural language or upload a reference image that records what they have seen with the phone. Our system will generate a high-quality 3D mesh that faithfully matches the users' input. We propose a coarse-to-fine framework to achieve this goal. Specifically, we first obtain a low-resolution mesh instantly by utilizing a pre-trained text/image conditional 3D generative model. Using such coarse mesh as the initialization, we further optimize a high-resolution textured 3D mesh with fine-grained appearance guidance from large-scale 2D diffusion models. Our system can create visually-pleasing results in minutes, which is significantly faster than existing methods. Meanwhile, the system ensures that the resulting 3D assets are precisely aligned with the input text or image prompt. With these advanced capabilities, our demonstration provides a streamlined and intuitive platform for users to incorporate 3D creation into their daily lives.
0 Replies
Loading