Unleashing the Power of 2D Diffusion Representation for High Fidelity 3D Generation

Haorui Ji; Weizhe Liu; Hongdong Li; Daniel Kang Du; Hengkai Guo

Unleashing the Power of 2D Diffusion Representation for High Fidelity 3D Generation

Haorui Ji, Weizhe Liu, Hongdong Li, Daniel Kang Du, Hengkai Guo

20 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D generation, 3DGS, sparse structure latent

Abstract: State-of-the-art (SOTA) approaches for 3D content generation are predominantly built upon a sequential framework: first generating geometric shapes, followed by texture estimation that leverages geometric cues. Consequently, these methods typically incur computational costs on the order of minutes when generating both geometry and texture, and often suffer from significant shape-texture misalignment—a limitation attributed to the sequential decoupling of these two stages. To mitigate these limitations, recent works have aimed to jointly model geometry and texture within a unified framework, which in turn enhances shape-texture consistency. Nevertheless, these joint approaches still face challenges in precise texture modeling, largely due to the loss of fine-grained texture details during latent feature learning. To address this remaining challenge, in this work, we propose a novel joint architecture that not only preserves the advantage of unifying geometry and texture modeling but also retains and effectively captures fine-grained texture details by integrating image diffusion features into the latent feature learning process. We further recognize that modeling such fine-grained texture features presents notable challenges, which arise from the inherent complexity of mapping 2D visual details onto 3D surfaces. To alleviate this challenge, we introduce a diffusion-based module that enhances cross-modal alignment between 3D structures and 2D image inputs, thereby enabling the direct learning of rich, fine-grained texture features from 2D image conditions. Extensive empirical evaluations demonstrate that our approach results in a 3D content generation algorithm that outperforms existing SOTA approaches, delivering substantial improvements in texture modeling quality.

Primary Area: generative models

Submission Number: 22803

Loading