This video presents NavTrust, our unified benchmark for stress-testing the trustworthiness of embodied navigation across both Vision-Language Navigation (VLN) and Object-Goal Navigation (OGN).

We begin with a concise tour of NavTrust's design: VLN and OGN episodes are aligned to the same start/goal in Matterport3D scenes, and agents are evaluated under coordinated RGB (blur, low-light, spatter, flare, foreign-object, blackout, defocus), depth (Gaussian noise, missing data, multipath, quantization), and instruction corruption (style rewrites, capitalization, masking, and black-/white-box prompt injections). A new metric: Performance Retention Score (PRS) which quantifies how much success an agent retains under corruption relative to clean runs.

Next, we present quantitative results over six state-of-the-art agents (ETPNav, NaVid-7B, WMNav, L3MVN, PSL, VLFM), showing double digit success rate drops under realistic sensor and language stressors. Depth corruptions emerge as a universal Achilles’ heel, while late-fusion designs and panoramic inputs improve robustness; instruction attacks expose tokenizer and prompt-handling brittleness.

We then visualize top-down trajectories in photorealistic MP3D/HM3D scenes to illustrate failure modes—e.g., drift under low-lighting with noise, stall under depth quantization, and detours triggered by adversarial prompts—contrasted with clean baselines.

We systematically evaluate Four distinct strategies: data augmentation, teacher-student knowledge distillation,
lightweight adapter tuning and Safeguard LLM to enhance robustness. Our experiments offer a practical path for developing more resilient embodied agents.

Finally, we show how NavTrust's corruption suite doubles as a plug-and-play augmentation library that practitioners can use during training to harden models, and we preview the forthcoming code release and public leaderboard, encouraging the community to report both peak accuracy and robustness.