Prioritizing Faithfulness: Efficient Zero-Shot Novel View Synthesis with Adaptive Latent Modulation

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: NVS, Zero-shot
TL;DR: Faithfulness-oriented zero-shot camera-controlled NVS with video diffusion model
Abstract: The challenge of camera-controlled novel view synthesis (NVS) lies in balancing high visual fidelity with strict faithfulness to the source scene. We argue that current dominant approaches, which rely on finetuning large-scale diffusion models, often over-emphasize fidelity while struggling with faithfulness due to their generative nature. To address this, we propose a zero-shot NVS pipeline that prioritizes faithfulness and efficiency. Our method introduces two key contributions applied during inference: (1) Test-time Latent Homography Deformation, an on-the-fly homography optimization to deform latents for global motion consistency, and (2) Spatially Adaptive RePaint (SA-RePaint), an extension to RePaint that achieves both structural consistency and texture fidelity by introducing a mathematically-grounded, region-wise balancing of these two objectives. Our evaluations demonstrate substantial improvements in faithfulness and camera accuracy with competitive perceptual scores, highlighting a successful integration of faithfulness, quality, and efficiency. This work offers a promising direction for NVS that rebalances the focus towards greater authenticity.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7482
Loading