WorldCrafter: Dynamic Scene Generation from a Single Image with Geometric and Temporal Consistency

Haoye Dong; Gim Hee Lee

WorldCrafter: Dynamic Scene Generation from a Single Image with Geometric and Temporal Consistency

Haoye Dong, Gim Hee Lee

15 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Scene Generation, Video Generation

Abstract: We present WorldCrafter, a novel framework that enables interactive dynamic scene generation from a single image by leveraging geometry-aware and temporal modeling. Existing methods often suffer from texture distortion, structural inaccuracies, and temporal flickering under large viewpoint changes. These issues mainly caused by explicit pixel-wise reprojection strategies. To address these challenges, WorldCrafter introduces two complementary modules: 1) Geometry-aware Video Depth Refinement, which enhances structural fidelity by refining depth with multi-frame geometric priors and semantic cues; and 2) Object-consistent Temporal Modeling, which disentangles video frames into object-level layers to improve coherence between static backgrounds and dynamic foregrounds. These components form a unified rendering-inpainting framework for photorealistic and camera-controllable dynamic scene generation. Experiments demonstrate that WorldCrafter produces geometrically accurate and temporally coherent results across diverse scenes and camera trajectories.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 5999

Loading