FastVGGT: Fast Visual Geometry Transformer

You Shen; Zhipeng Zhang; Yansong Qu; Xiawu Zheng; Jiayi Ji; Shengchuan Zhang; Liujuan Cao

FastVGGT: Fast Visual Geometry Transformer

You Shen, Zhipeng Zhang, Yansong Qu, Xiawu Zheng, Jiayi Ji, Shengchuan Zhang, Liujuan Cao

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D reconstruction

TL;DR: FastVGGT observes strong similarity in VGGT's attention maps and leverages training-free acceleration.

Abstract: Scaling visual geometry transformers for long image sequences poses a significant computational and memory challenge. In this work, we diagnose this issue in the state-of-the-art model VGGT, and trace the primary bottleneck to its Global Attention layer. Our analysis reveals a ``token collapse'' phenomenon, where many tokens attend to nearly identical regions, resulting in redundant computation and inefficiency. Motivated by this finding, we propose FastVGGT, a training-free framework that strategically prunes these redundant tokens. Instead of uniform merging, FastVGGT employs a tailored, three-part token partitioning strategy. It preserves initial-frame tokens as a stable global reference, retains salient tokens to maintain fine details, and utilizes region-based random sampling to ensure spatially balanced coverage. Extensive experiments on multiple 3D geometry benchmarks validate our approach's effectiveness. Notably, on sequences of 1000 images, FastVGGT achieves a 4$\times$ speedup over the original VGGT while simultaneously mitigating error accumulation, demonstrating its efficiency and robustness for long-sequence scenarios. For further details, please visit our project page: https://fastvggt.github.io/.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 5717

Loading