FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

Here we show Gaussian Splats trained with our poses, intrinsics, and dense depths on popular scenes:






Abstract

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce a differentiable re-parameterization of depth, intrinsics, and pose that is amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360° trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360° novel view synthesis - even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM. Our result opens the door to the self-supervised training of neural networks that perform camera parameter estimation, 3D reconstruction, and novel view synthesis.

Point Clouds and Cameras from our Method

Side-by-Side Reconstructions

We compare test view reconstructions with NoPe-NeRF, COLMAP, and DROID-SLAM. *Requires known intrinsics.


Gaussian Splats from FlowMap vs COLMAP:

Note here we fit a smoothened path to the estimated poses from each method and so alignment between the two is imperfect.
Splats trained with our method are comparable to and often even outperform those trained with COLMAP.


Pose Plots

Here we show pose plots and estimated depth maps for popular scenes.



Splats Initialized with our Poses and Geometry

(Raw Video → FlowMap → Gaussian Splatting)

Here we render smoothened trajectories for additional Gaussian Splats trained with FlowMap's estimates on popular scenes.