MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

TMLR Paper4347 Authors

25 Feb 2025 (modified: 07 Aug 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the \textit{Janus} problem—multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D Gaussian splatting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns Gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over recent 3DGS-based methods such as GSGen ($\sim$2 hours) and LucidDreamer ($\sim$35 minutes), reducing training time by up to 2$\times$ while achieving comparable or better results.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We have uploaded the revised version based on the reviewers' comments.

Assigned Action Editor: ~Chinmay_Hegde1

Submission Number: 4347

Loading