Keywords: Underwater, Depth Estimation, Foundation Model, 3D Reconstruction
Abstract: Underwater 3D reconstruction poses significant challenges due to the scarcity of large-scale labeled datasets and the lack of foundation models specifically designed for underwater scenarios. To overcome these limitations, we introduce SeaVGGT, a self-supervised framework for underwater geometric estimation that operates without reliance on annotated data or enhancement references. SeaVGGT exploits the fundamental physical principle that underwater image degradation inherently encodes scene depth, and captures this phenomenon through a graph of learnable prototypes. These prototypes encapsulate a diverse range of attenuation characteristics and are dynamically selected as context-aware conditions to modulate visual features in a depth-sensitive manner. The framework is trained in an end-to-end fashion using a set of physics-driven self-supervision losses, which enforces cyclic consistency between the original and reconstructed images based on the underwater imaging formation model. To robustly handle the variability of water types and environmental conditions, SeaVGGT adaptively refines prototype representations conditioned on the input image, thereby enabling strong generalization across diverse underwater domains. Extensive experiments on FLSea, USOD10K, and SQUID datasets demonstrate that SeaVGGT achieves a 13.47% reduction in RMSE under unseen water conditions compared to the VGGT baseline, underscoring its efficacy and broad applicability.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 6107
Loading