HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation

Antoine Mercier; Ramin Nakhli; Mahesh Reddy; Rajeev Yasarla; Hong Cai; Fatih Porikli; Guillaume Berger

HexaGen3D: StableDiffusion is One Step Away from Fast and Diverse Text-to-3D Generation

Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger

Published: 01 Jan 2025, Last Modified: 25 Sept 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D objects from textual prompts remains a difficult task. A key chal-lenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of samples, while their 2D coun-terparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power of large, pretrained 2D diffusion models. More specifically, our approach, HexaGen3D, fine-tunes a pre-trained text-to-image model to jointly predict 6 orthographic projections and the corresponding 3D latent. We then decode these latents to generate a textured mesh. Hex-aGen3D does not require per-sample optimization, and can infer high-quality and diverse objects from textual prompts in 7 seconds, offering significantly better quality-to-latency trade-offs than existing approaches. Furthermore, Hexa-Gen3D demonstrates strong generalization to new objects or compositions.

Loading