Text to 3D Object Generation for Scalable Room Assembly

Sonia Laguna; Alberto Garcia-Garcia; Marie-Julie Rakotosaona; Stylianos Moschoglou; Leonhard Helminger; Sergio Orts-Escolano

Text to 3D Object Generation for Scalable Room Assembly

Sonia Laguna, Alberto Garcia-Garcia, Marie-Julie Rakotosaona, Stylianos Moschoglou, Leonhard Helminger, Sergio Orts-Escolano

Published: 04 Mar 2025, Last Modified: 17 Apr 2025ICLR 2025 Workshop SynthDataEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Object Generation, Text-to-3D Generation, Multi-view Diffusion, NeRF, Synthetic meshes

TL;DR: This paper presents an automated text-to-3D mesh system that leverages multi-view diffusion, NeRF-based meshing for synthetic assets generation to create high-fidelity, scalable 3D indoor scenes.

Abstract: Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, we propose an end-to-end system for synthetic data generation for scalable, high-quality, and customizable 3D indoor scenes. By integrating and adapting text-to-image and multi-view diffusion models with Neural Radiance Field-based meshing, this system generates high-fidelity 3D object assets from text prompts and incorporates them into pre-defined floor plans using a rendering tool. By introducing novel loss functions and training strategies into existing methods, the system supports on-demand scene generation, aiming to alleviate the scarcity of current available data, generally manually crafted by artists. This system advances the role of synthetic data in addressing machine learning training limitations, enabling more robust and generalizable models for real-world applications.

Submission Number: 76

Loading