Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning

Sungjune Kim; Gyeongrok Oh; Heeju Ko; Daehyun Ji; Dongwook Lee; Byung-Jun Lee; Sujin Jang; Sangpil Kim

Test-Time Adaptation for Online Vision-Language Navigation with Feedback-based Reinforcement Learning

Sungjune Kim, Gyeongrok Oh, Heeju Ko, Daehyun Ji, Dongwook Lee, Byung-Jun Lee, Sujin Jang, Sangpil Kim

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: We introduce a novel test-time adaptation framework for online vision-language navigation using feedback-based reinforcement learning.

Abstract: Navigating in an unfamiliar environment during deployment poses a critical challenge for a vision-language navigation (VLN) agent. Yet, test-time adaptation (TTA) remains relatively underexplored in robotic navigation, leading us to the fundamental question: what are the key properties of TTA for online VLN? In our view, effective adaptation requires three qualities: 1) flexibility in handling different navigation outcomes, 2) interactivity with external environment, and 3) maintaining a harmony between plasticity and stability. To address this, we introduce FeedTTA, a novel TTA framework for online VLN utilizing feedback-based reinforcement learning. Specifically, FeedTTA learns by maximizing binary episodic feedback, a practical setup in which the agent receives a binary scalar after each episode that indicates the success or failure of the navigation. Additionally, we propose a gradient regularization technique that leverages the binary structure of FeedTTA to achieve a balance between plasticity and stability during adaptation. Our extensive experiments on challenging VLN benchmarks demonstrate the superior adaptability of FeedTTA, even outperforming the state-of-the-art offline training methods in REVERIE benchmark with a single stream of learning.

Lay Summary: Imagine a robot trying to find its way around a new place based on spoken instructions, like "go to the red couch." This is called Vision-Language Navigation (VLN). Our research started because these robots often struggle in environments they haven't seen before. While we train them beforehand, adapting on the fly in a new setting hasn't been explored much. We asked: what's crucial for a robot to learn and adjust as it navigates in real-time? To solve this, we created \textsc{FeedTTA}, a new way for robots to learn while they navigate. Our method uses simple feedback – a "yes" or "no" at the end of each attempt, telling the robot if it succeeded. We also developed a special technique to help the robot learn quickly without forgetting what it already knows, balancing being adaptable and stable. Our work matters because it shows a more effective way for navigation robots to handle unfamiliar situations. Our approach outperforms even the best pre-trained methods on a challenging task, meaning robots could become much better at following instructions in the real world, even in places they've never seen before.

Primary Area: Applications->Robotics

Keywords: vision-language navigation, test-time adaptation

Submission Number: 920

Loading