Projection-Domain Adaptation for 3D Transformers: From Perspective to Panoramic Scene Reconstruction
Keywords: World Model; 3D Reconstruction; Domain Adaptation
Abstract: World models increasingly rely on panoramic perception, as omnidirectional views provide geometry-consistent observations crucial for spatial reasoning. However, existing panoramic world models are predominantly built on video representations, which lack explicit 3D structure. In contrast, large-scale 3D Transformers such as VGGT excel at scene reconstruction from perspective inputs but degrade under equirectangular projection (ERP) due to a projection-domain mismatch. We introduce Projection-Domain Adaptation, a principled framework that restores the geometric invariances broken by ERP. Our method consists of three innovations: ray-field alignment, which embeds explicit 3D rays to establish a rotation-consistent reference space; ray-enriched LoRA adaptation, which achieves panoramic specialization with less than 0.5% trainable parameters; and latitude-aware depth uncertainty, which leverages the spherical Jacobian to correct ERP’s non-uniform reliability. Experiments demonstrate that our framework substantially outperforms zero-shot VGGT and plain full finetuning, while our LoRA variant attains accuracy comparable to our full finetuning setting at over 70× fewer parameters and 26× less training cost. These results highlight a generalizable pathway for building panoramic world models grounded in 3D geometry, moving beyond the limitations of video-based approaches.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 10658
Loading