Keywords: Multi-view generation, Street scenes, Stable diffusion
Abstract: Multi-view Stable Diffusion has been proposed and applied for indoor or wild scene generation. However, the generation of outdoor scenes, especially urban street scenes, has not yet been well studied, which shall be more complicated than existing indoor or wild scene generation due to containing more objects and structures. In this work, we focus on the generation of street scenes relying on a multi-view stable diffusion model with structure prompts, such as segmentation maps. Thus, we propose StreetDiffusion, which employs a dual-branch architecture to integrate panoramic and local information where structural priors are inserted into two branches to generate highly consistent and realistic multi-view street scene images. To study the street scene generation issue, we also propose a large multi-view street scene dataset, Street 360, which consists of 10K panoramic images from urban streets. Experiments demonstrate that the proposed StreetDiffusion model generates high-quality street scenes, with a clear advantage in the street scene generation task over existing multi-view generation models designed for indoor or wild scenes.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 10723
Loading