WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Zizun Li; Jianjun Zhou; Yifan Wang; Haoyu Guo; Wenzheng Chang; Yang Zhou; Haoyi Zhu; Junyi Chen; Chunhua Shen; Tong He

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Zizun Li, Jianjun Zhou, Yifan Wang, Haoyu Guo, Wenzheng Chang, Yang Zhou, Haoyi Zhu, Junyi Chen, Chunhua Shen, Tong He

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computer vision, 3D reconstruction, machine learning

TL;DR: WinT3R: a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps.

Abstract: We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without introducing a large amount of extra computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 879

Loading