Text-Guided 3D Head Synthesis Using Geometry Images

17 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human avatar, Generative model, Optimal transport, text-guided 3D Generation
Abstract: In recent years, text-guided 3D head generation has advanced considerably with the development of 3D morphable models (3DMMs) and their integration with vision–language models (VLMs). Nevertheless, existing approaches remain limited by the coarse level of detail in commonly used 3DMMs, which restricts their ability to synthesize fine-grained facial geometry and complex expressions. To address this limitation, we propose a novel framework for text-guided expressive 3D head generation. Unlike prior works that directly operate on mesh-based representations, our method leverages geometry images as the core 3D shape representation. Our method begins by computing a measure-preserving parameterization for each head mesh, minimizing area distortion while allowing local magnification of regions of interest. This parameterization enables the construction of geometry images, which we then use to train a conditional Denoising Diffusion Probabilistic Model (DDPM). By reformulating 3D generation as a 2D image synthesis problem, our framework excels at capturing fine-grained geometric details and expressive deformations that mesh-based pipelines often fail to reproduce. Extensive quantitative and qualitative experiments demonstrate that our approach produces high-quality human avatars and consistently outperforms existing methods.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 9761
Loading