Keywords: Multi-view, Generative Models, Diffusion Models
TL;DR: The paper presents a new method for multi-view synthesis using a joint diffusion model that outperforms the state-of-the-art methods.
Abstract: Multi-view observations offer a broader perception of the real world, compared to observations acquired from a single viewpoint. While existing multi-view 2D diffusion models for novel view synthesis typically rely on a single conditioning reference image, a limited number of methods accommodate a multiple number thereof, by explicitly conditioning the generation process through tailored attention mechanisms. In contrast, we introduce DiMViS, a novel method enabling the conditional generation in multi-view settings by means of a joint diffusion model. DiMViS capitalizes on a pre-trained diffusion model, while combining an innovative masked diffusion process to implicitly learn the underlying conditional data distribution, which endows our method with the ability to produce multiple images given a flexible number of reference views. Our experimental evaluation demonstrates DiMViS's superior performance compared to current state-of-the-art methods, while achieving reference-to-target and target-to-target visual consistency.
Submission Number: 95
Loading