TinyVGGT: Lossless Post-Training Quantization for Visual Geometry Grounded Transformer

Zhizhen Pan; Hesong Wang; Huan Wang

TinyVGGT: Lossless Post-Training Quantization for Visual Geometry Grounded Transformer

Zhizhen Pan, Hesong Wang, Huan Wang

05 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: model quantization, 3D computer vision

Abstract: The Visual Geometry Grounded Transformer (VGGT) represents a significant advancement in 3D computer vision, demonstrating state-of-the-art performance in inferring key 3D attributes such as camera parameters, depth maps, and point clouds directly from images. However, its substantial model size poses a significant barrier to its deployment on resource-constrained edge devices like unmanned aerial vehicles (UAVs), limiting its real-world applicability. To address this limitation, we introduce TinyVGGT, a tailored quantization framework designed to compress VGGT. Our approach starts from the observation that transformer blocks within VGGT exhibit heterogeneous sensitivity to quantization. We thus propose a performance-aware quantization strategy that applies layer-wise mixed precision quantization to minimize the cumulative error. Furthermore, we design a specialized calibration scheme to smooth the distribution of camera tokens to preserve the high-precision geometric information. Finally, to systematically align quantization with the 3D prediction task, we propose a scale search mechanism, where candidate scales are evaluated not only by per-head reconstruction error but also by enforcing geometric consistency among camera poses, depth maps, and point maps. Extensive experiments on various geometry perception benchmarks demonstrate our method achieves lossless 4-bit quantization, preserving the performance of all 3D attributes prediction while reducing the overall model size by more than 50%. This work makes the deployment of high-fidelity 3D reconstruction on edge platforms feasible, unlocking real-world applications with limited resource.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2026/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2335

Loading