Keywords: 4D Reconstruction, 3D Tracking, Multi-Modal Reconstruction, Metric 3D Reconstruction
TL;DR: Any4D is a unified feed-forward architecture, that faithfully reconstructs geometry and motion of a dynamic scene and can accept any combination of input modalities, and any number of views.
Abstract: We present Any4D, a framework for feed-forward metric-scale dense 4D reconstruction. Compared to other recent methods for feedforward 4D reconstruction from monocular RGB videos, Any4D is multimodal due to its focus on diverse camera setups, allowing it to process additional modalities and sensors when available, such as RGB-D frames, IMU-based egomotion and Doppler measurements from Radar.Moreoever, Any4D can directly generate dense feedforward predictions for $N$ frames, in contrast to prior work that typically focuses on either 2-view dense scene flow or sparse 3D point tracking. One of the innovations that allow for such flexible input modalities is a modular approach to representing the 4D scene; specifically, 4D predictions are encoded using a variety of egocentric factors (such as depthmaps and camera intrinsics) represented in local camera coordinates, as well as allocentric factors (such as camera extrinsics and scene flow) represented in global world coordinates. We show that Any4D achieves superior performance over existing methods across diverse sensor setups - both in terms of accuracy and compute efficiency, opening up avenues for real-time deployment in downstream robotics applications.
Submission Number: 24
Loading