Figure A. Comparisons against the concurrent work CameraCtrl. Our results showcase outstanding consistency. CameraCtrl's output contains obvious flickering artifacts and distortions.
CameraCtrl
Ours
Reference Video
A blue chair on a carpet in a living room.
Majestic mountains with an eagle gliding effortlessly through the sky.
Figure B: 3D Reconstruction Results of CamCo's Generated Videos. We provide novel view renderings of the 3D scenes reconstructed from CamCo's output videos. This 3D-consistency of the video frames is hard to achieve in previous methods.
Generated Videos (Object Centric)
Reconstructed 3D Scenes (Object Centric)
Generated Videos (In-door and Out-door Scenes)
Reconstructed 3D Scenes (In-door and Out-door Scenes)
Figure C. CamCo Evaluated on In-the-wild Images. We provide additional results of CamCo evaluated on in-the-wild images.
Figure D: More Dynamic Results and Comparisons. We provide additional dynamic results of CamCo and MotionCtrl. We invite the reviewers to evaluate comparatively. MotionCtrl tends to produce static results with little-to-no object motion.
MotionCtrl
Ours
Figure E: Qualitative Comparisons for Ablation Studies. We provide qualitative comparisons of our ablation study. We invite the reviewers to evaluate comparatively.
(a). Effectiveness of our proposed Epipolar Constraint Attention (ECA). Texture of the floor is well-preserved at novel viewpoints when ECA is present.
CameraCtrl's output show severe inconsistency between frames.
CameraCtrl
Without ECA
With ECA
A view of a living room from an open doorway.
(b). Effectiveness of our proposed dynamic data curation pipeline. Without the proposed data curation pipeline, camera pose estimates are noisy and the video generator is not able to generate both object motion and camera motion.
MotionCtrl, in comparison, is only able to generate static scenes (the water is fixed).