Efficient Text-Guided 3D-Aware Generation With Score Distillation on 3D Distribution

Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun Feng, Yujiu Yang, Yansong Tang

Published: 2025, Last Modified: 06 Apr 2026IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text-to-3D generation enables the creation of 3D content with infinite possibilities. Existing methods typically involve training 3D generative models, which suffer from poor semantic alignment due to the scarcity of paired 3D data, or optimizing a 3D representation with 2D diffusion guidance, resulting in slow inference, low diversity, and Janus problems. In this paper, we introduce InstantDreamer, a model designed for text-guided 3D-aware generation in a single forward pass without requiring paired training datasets, thereby enhancing efficiency. To accomplish this, we extend score distillation to learn a 3D-aware semantics distribution. We distill priors from diffusion models into a 3D-aware generator, amortizing the optimization time required for new prompts and eliminating the necessity of paired training data. We equip the generator with hierarchical semantics conditioning, explicitly allowing the model to perceive the correspondence between the text distribution and the 3D latent space. Our elaborate designs empower our 3D generative model with multi-view semantic consistency and feed-forward 3D generation capabilities, thus eliminating the need for score distillation-based optimization for each prompt. Both quantitative and qualitative results on the mainstream benchmarks demonstrate that our InstantDreamer generates competitive multi-view semantic consistent 3D assets compared with state-of-the-art methods. Our method outperforms previous approaches in terms of CLIP R-Precision (66.31) and FID (28.47) while also exhibiting a significant boost in generation speed.
Loading