Extend3D: Town-scale 3D Generation

Seungwoo Yoon; Jinmo Kim; Jaesik Park

Extend3D: Town-scale 3D Generation

Seungwoo Yoon, Jinmo Kim, Jaesik Park

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D scene generation, training-free, overlapping patch-wise flow

Abstract: In this paper, we propose Extend3D, a novel training-free pipeline for 3D scene generation from a single image, built upon an object-centric 3D generative model. To overcome the limitations of fixed-size latent spaces of object-centric models in representing wide scenes, we extend the latent space $(a, b)$ times in $x$ and $y$ directions. Then, by dividing the extended latent into overlapping patches, we utilize the object-centric model on each patch and couple them every time step. In addition, since object-centric models are poor at sub-scene generation, we use the input image and point cloud extracted from a depth estimator as priors to enable this process. Using the point cloud prior, we initialize the structure of the scene and refine the occluded region with iterative under-noised SDEdit. Also, both priors are used to optimize the extended latent during the denoising process so that the denoising paths don't deviate from the sub-scene dynamics. We demonstrate that our method produces better results compared to the previous methods by evaluating human preferences. An ablation study shows that each component of Extend3D has a crucial role in the training-free 3D scene generation.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 11309

Loading