Towards Physically Executable 3D Gaussian for Embodied Navigation

Published: 26 Jan 2026, Last Modified: 28 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Gaussian Splatting; Vision-and-Language Navigation
Abstract: 3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose **SAGE-3D** (**S**emantically and Physically **A**ligned **G**aussian **E**nvironments for **3D** Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: **(1) Object-Centric Semantic Grounding**, which adds object-level fine-grained annotations to 3DGS; and **(2) Physics-Aware Execution Jointing**, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release **InteriorGS**, containing 1K object-annotated 3DGS indoor scene data, and introduce **SAGE-Bench**, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8941
Loading