everyone">EveryoneRevisionsBibTeX
For large-scale deployment of autonomous agents, they must perform their tasks not only in their training environment but also in environments they have never seen before, such as robots transferred from controlled testbeds to households. Traditional approaches improve adaptability during training by employing varied environments or during deployment by relying on finetuning. However, the former often fails in unforeseen conditions, while the latter requires access to true reward labels, usually unavailable outside controlled settings. In this work, we address the challenge of adapting to environments with different dynamics and observations from the training environment, without explicit reward signals. We identify that learned task objectives, represented by reward models, are often transferable even when policies are not, as they are more robust against changes in dynamics. However, reward model performance in target environments is vulnerable to new observational shifts like lighting or noise. To address this, our key insight is adapting the reward model at test time, using a self-supervised learning framework. We empirically demonstrate that adapting reward with our method enables policies to solve tasks under new challenges, such as added noise, obstacles, or reversed dynamics, where traditional policy and naive reward transfer methods fail.