Abstract: This paper presents an innovative approach to multi-view generation that can be comprehensively controlled over both perspectives (viewpoints) and non-perspective attributes (such as depth maps). Our controllable dual-branch pipeline, named Depth Guided Branched Diffusion (DGBD), leverages depth maps and perspective information to generate images from alternative viewpoints while preserving shape and size fidelity. In the first DGBD branch, we fine-tune a pre-trained diffusion model on multi-view data, introducing a regularized batch-aware self-attention mechanism for multi-view consistency and generalization. Direct control over perspective is then achieved through cross-attention conditioned on camera position. Meanwhile, the second DGBD branch introduces non-perspective control using depth maps. Qualitative and quantitative experiments validate the effectiveness of our approach, surpassing or matching the performance of state-of-the-art novel view and multi-view synthesis methods.
Loading