Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

ICLR 2026 Conference Submission13561 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: City generation, View generation, 3DGS, Satellite imagery, Diffusion models

TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, achieving superior geometry and textures with real-time rendering performance.

Abstract: Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In this paper, we take an alternative route to create large-scale 3D scenes by synergizing the readily available satellite imagery that supplies realistic coarse geometry and the open-domain diffusion model for creating high-quality close-up appearances. We propose **Skyfall-GS**, the first city-block scale 3D scene creation framework without costly 3D annotations, also featuring real-time, immersive 3D exploration. We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric completeness and photorealistic textures. Extensive experiments demonstrate that Skyfall-GS provides improved cross-view consistent geometry and more realistic textures compared to state-of-the-art approaches.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 13561

Loading