FlexWorld: Progressively Expanding 3D Scenes for Flexible-View Exploration

Published: 18 Sept 2025, Last Modified: 11 Dec 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D scene generation, Diffusion model, 3D Gaussian Splatting
TL;DR: FlexWorld generates flexible-view 3D scenes from single images using progressive expanding 3D Gaussian splatting and a fine-tuned video-to-video model, outperforming existing methods in quality and exploration flexibility.
Abstract: Generating flexible-view 3D scenes, including 360° rotation and zooming, from single images is challenging due to a lack of 3D data. To this end, we introduce FlexWorld, a novel framework that progressively constructs a persistent 3D Gaussian splatting representation by synthesizing and integrating new 3D content. To handle novel view synthesis under large camera variations, we leverage an advanced pre-trained video model fine-tuned on accurate depth-estimated training pairs. By combining geometry-aware scene integration and optimization, FlexWorld refines the scene representation, producing visually consistent 3D scenes with flexible viewpoints. Extensive experiments demonstrate the effectiveness of FlexWorld in generating high-quality novel view videos and flexible-view 3D scenes from single images, achieving superior visual quality under multiple popular metrics and datasets compared to existing state-of-the-art methods. Additionally, FlexWorld supports extrapolating from existing 3D scenes, further extending its applicability. Qualitatively, we highlight that FlexWorld can generate high-fidelity scenes that enable 360° rotations and zooming exploration. Our code is available at https://github.com/ML-GSAI/FlexWorld.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 14193
Loading