DreamGen: Real-Time Interactive Generation of 4D Dreamscapes

DreamGen: Real-Time Interactive Generation of 4D Dreamscapes

ACL ARR 2025 May Submission5255 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advancements in visual generative models have substantially broadened the capabilities of scene synthesis across modalities such as video, 3D, and 4D environments which have significantly enhanced the application in various domains. Despite this progress, most existing systems treat scenes in isolation, lacking long-range spatial-temporal coherence and interactive control mechanisms. These shortcomings lead to the lack of interactivity and composability, limiting their potential in scenarios such as immersive entertainment and education. To address this, we introduce DreamGen, a novel unified framework designed to transform a single panoramic image into a fully interactive, panoramic 4D world. DreamGen operates through an integrated three-stage pipeline: First, it achieves view-consistent 3D reconstruction via Gaussian Splatting, employing monocular depth estimation and diffusion-based inpainting to enrich and complete the scene; next, it simulates continuous camera trajectories to ensure geometric and temporal consistency; finally, it combines these outputs within a real-time, event-driven Supersplat renderer to facilitate dynamic editing and immersive exploration. Extensive experiments on the comprehensive WorldScore benchmark demonstrate DreamGen's superior performance, outperforming existing state-of-the-art methods in controllability, visual fidelity, and motion dynamics. Our approach not only establishes new standards in interactive and coherent 4D world generation but also opens promising avenues for applications in immersive entertainment, embodied AI, and advanced simulation scenarios.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Text Image-to-4D Generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5255

Loading