DreamLCM: Towards High Quality Text-to-3D Generation Via Latent Consistency Model

Yiming Zhong; Xiaolin Zhang; Yao Zhao; Yunchao Wei

DreamLCM: Towards High Quality Text-to-3D Generation Via Latent Consistency Model

Yiming Zhong, Xiaolin Zhang, Yao Zhao, Yunchao Wei

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance,~\ie, predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, which helps DreamLCM to increase the consistency of guidance and optimize 3D models from geometry to appearance. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency.

Primary Subject Area: [Content] Vision and Language

Relevance To Conference: Our paper targets at text-to-3D task. This task involves three modalities, including text, 2D image, and 3D object. Given a text prompt describing a 3D scene or object, our work utilizes the latest Latent Consistency Model to generate 2D guidance images, which indicate the gradient to update the 3D model. The pipeline has supervision and interacition between different modalities, and our work contributes to improve the supervision between 2D and 3D modalities. Moreover, some papers with the same topic and task were accepted by the ACM MM 2023.

Supplementary Material: zip

Submission Number: 1610

Loading