SC$^{2}$-WM: A Self-Correcting World Model with Closed-Loop Feedback for Vision-and-Language Navigation in Continuous Environments

Xuan Yao; Yuze Zhu; Junyu Gao; Zongmeng Wang; Changsheng Xu

SC$^{2}$-WM: A Self-Correcting World Model with Closed-Loop Feedback for Vision-and-Language Navigation in Continuous Environments

Xuan Yao, Yuze Zhu, Junyu Gao, Zongmeng Wang, Changsheng Xu

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce a closed-loop world model that derives internal feedback from action-conditioned foresight to perform self-correction in VLN-CE.

Abstract: Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to make fine-grained navigation decisions under partial observability. However, most existing methods rely on open-loop execution, lacking mechanisms to detect and correct internal state drift during inference. We propose SC$^{2}$-WM, a self-correcting world model framework that introduces internal feedback for closed-loop decision making in VLN-CE. Our method derives feedback from world-model foresight to perform state-level plan refinement before action execution. To handle challenging scenarios, we further introduce conditional world-aware adaptation, which enables model-level correction by selectively updating the world model at test time when feedback indicates model capacity insufficiency. Experiments on standard VLN-CE benchmarks demonstrate improved navigation robustness and generalization. Our code is available at https://github.com/sunrise-ikun/SC2_WM.

Lay Summary: Vision-and-Language Navigation in Continuous Environments (VLN-CE) studies how robots navigate unfamiliar environments by following natural language instructions. However, existing systems often make decisions in an open-loop manner, meaning they cannot recognize when their internal understanding becomes unreliable during navigation. As a result, errors may gradually accumulate over time. We develop SC$^{2}$-WM, a self-correcting navigation framework that allows agents to internally evaluate and refine their decisions while moving. Our method uses a world model to imagine possible future outcomes before executing actions and generates internal feedback to correct inconsistent navigation plans. In challenging situations, the system can further adapt its internal model online to better handle previously unseen environments. Experiments on standard VLN-CE benchmarks show improved robustness and generalization in complex continuous environments.

Originally Submitted Supplementary Material: zip

Link To Code: https://github.com/sunrise-ikun/SC2_WM

Primary Area: Applications->Computer Vision

Keywords: Vision-and-Language Navigation in Continuous Environments, World Model, Closed-Loop

Originally Submitted PDF: pdf

Submission Number: 6618

Loading