Reconstruct, Inpaint, Test-Time Finetune:
Dynamic Novel-view Synthesis from Monocular Videos

NeurIPS Paper ID 13013

Qualitative Comparison on Kubric-4D and ParallelDomain-4D

Input view GT point cloud GCD TrajCrafter CogNVS GT novel view
Qualitative comparison on Kubric-4D and ParallelDomain-4D Note that TrajectoryCrafter is able to generate a reasonable background for the unseen scene regions, but is not able to inpaint the shadows / masks created by foreground objects (row 2 and 6). GCD is trained on Kubric-4D so performs reasonably well but struggles to preserve the precise geometry. CogNVS achieves better performance as compared to baselines and is the closest in geometric consistency to the groundtruth novel view.

Qualitative Comparison on DyCheck

Input view MegaSAM Shape-of-Motion Mosca CAT4D TrajCrafter CogNVS
(MegaSAM)
CogNVS
(Mosca)
Qualitative comparison on DyCheck Note that baselines either do not hallucinate the unseen regions in the novel-view (Shape-of-Motion, MegaSAM), show blurry dynamic regions (MoSca, CAT4D), or are not able to preserve the underlying geometry of the scene (TrajectoryCrafter). CogNVS, on the other hand, is able to synthesize plausible and 3D-consistent novel-views with test-time finetuning.

Novel-view synthesis on in-the-wild examples

Input video Novel views by CogNVS
Novel-view synthesis on in-the-wild examples We select a wide array of in-the-wild videos (of animals, robot setups, movie clips) of dynamic scenes and show on the right, novel-view results from CogNVS.

Application of CogNVS to static scenes

Input video Novel views by CogNVS
Novel-view synthesis on static scenes. We test the application of CogNVS to novel-view synthesis of static scenes, even when the training data was not explicitly scoured from sources covering static environments. This shows that 3D view synthesis can be achieved by training solely on monocular 2D videos.

Novel-view synthesis on synthetic videos

Input video Novel views by CogNVS
Novel-view synthesis on synthetic videos generated by SORA. We test the generalization of our method to synthetic data distributions such as on videos generated by SORA. We find that CogNVS is also able to perform view synthesis for synthetic videos.