Abstract: Generating 3D scenes from natural language holds
great promise for applications in gaming, film, and design.
However, existing methods struggle with automation, 3D consistency, and fine-grained control. We present DreamScene, an
end-to-end framework for high-quality and editable 3D scene
generation from text or dialogue. DreamScene begins with a
scene planning module, where a GPT-4 agent infers object
semantics and spatial constraints to construct a hybrid graph.
A graph-based placement algorithm then produces a structured,
collision-free layout. Based on this layout, Formation Pattern
Sampling (FPS) generates object geometry using multi-timestep
sampling and reconstructive optimization, enabling fast and
realistic synthesis. To ensure global consistent, DreamScene
employs a progressive camera sampling strategy tailored to both
indoor and outdoor settings. Finally, the system supports finegrained scene editing, including object movement, appearance
changes, and 4D dynamic motion. Experiments demonstrate that
DreamScene surpasses prior methods in quality, consistency,
and flexibility, offering a practical solution for open-domain
3D content creation. Code and demos are available at https:
//jahnsonblack.github.io/DreamScene-Full/
Loading