Keywords: computer vision, 3D reconstruction, machine learning
TL;DR: WinT3R: a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps.
Abstract: We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps.
Previous methods suffer from a trade-off between reconstruction quality and real-time performance.
To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without introducing a large amount of extra computation.
In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency.
These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 879
Loading