CAMERA CONDITIONED VIDEO GENERATION WITH IMPROVED POSE FIDELITY

17 Sept 2025 (modified: 24 Oct 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: camera-conditioned video generation, novel view generation
TL;DR: camera trajectory-conditioned video generation framework that enables unconstrained-trajectory synthesis without external geometric priors.
Abstract: Novel-view video generation from dynamic scenes has emerged as a compelling research direction with the advancement of video diffusion models. However, current approaches face key constraints that restrict flexibility. Specifically, methods exploiting Image-to-Video models as a baseline are constrained by the bias of the base model, limiting the target camera pose of the initial frame to remain close to the source. Limited diversity of camera trajectories in currently available datasets also confines trained models to generating output with limited camera trajectories. The generation results of projection-based methods that rely on depth estimation are affected by projection errors present in the depth-warped input video. In this paper, we present FreeCam, a camera trajectory conditioned Video-to-Video generation framework that enables depth-free novel-view video generation for a constraint-free camera trajectory. We introduce infinite homography warping that encodes 3D camera rotations directly in 2D latent space without depth, enabling high camera pose fidelity. Also, we augment existing multi-view datasets with identical initial frames into the dataset with arbitrary-trajectories and heterogeneous intrinsic parameters, enabling training on diverse camera motions and focal lengths. Our experimental evaluation demonstrates that FreeCam delivers enhanced trajectory precision over existing state-of-the-art approaches while preserving visual fidelity. Notably, despite training exclusively on synthetic data, FreeCam generalizes effectively to real-world videos. Through comprehensive ablation studies and comparative analyses, we confirm the complementary advantages of our proposed data processing pipeline and infinite homography warping technique, together establishing a novel framework for achieving precise and flexible camera motion control in video synthesis applications.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 8282
Loading