Keywords: world models, procedural generation, text-to-3D, training-free, deterministic compilation, LLMs, vision-language models, 3D environment generation, spatial layout
TL;DR: Training-free pipeline that compiles narrative text into an executable, traversable 3D world using LLM planning, deterministic layout, and text-to-3D assets.
Abstract: Recent work on world generation has largely emphasized foundation-scale generative models trained on large multimodal datasets, often requiring substantial computational resources. In this work, we explore an alternative perspective: treating narrative world construction as a structured compilation problem rather than an end-to-end learned generation problem. We present a modular, training-free framework that uses multimodal large language models only for semantic abstraction, while delegating topology construction, spatial layout, traversability, and environment assembly to deterministic algorithms. The resulting pipeline converts narrative text into story-driven, navigable 3D worlds through lightweight API calls and executable compilation in the Godot engine. Across 20 prompts, the system produces collision-free, overlap-free environments with 100\% door reachability on commodity hardware. Our results suggest that world model research can advance not only through larger learned generators, but also through decomposition, controllability, and systems-level structure.
Submission Number: 43
Loading