

Here we show Gaussian Splats trained with our poses, intrinsics, and dense depths on popular scenes:
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce a differentiable re-parameterization of depth, intrinsics, and pose that is amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360° trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360° novel view synthesis - even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM. Our result opens the door to the self-supervised training of neural networks that perform camera parameter estimation, 3D reconstruction, and novel view synthesis.
We compare test view reconstructions with NoPe-NeRF, COLMAP, and DROID-SLAM. *Requires known intrinsics.
Note here we fit a smoothened path to the estimated poses from each method and so alignment between the two is imperfect.
Splats trained with our method are comparable to and often even outperform those trained with COLMAP.
Here we show pose plots and estimated depth maps for popular scenes.
Here we render smoothened trajectories for additional Gaussian Splats trained with FlowMap's estimates on popular scenes.