Zero-Shot Reinforcement Learning Under Partial Observability

Published: 09 May 2025, Last Modified: 28 May 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: zero-shot RL; behaviour foundation models; foundation policies; pomdps
TL;DR: We show that zero-shot RL can be performed under partial observability by augmenting SoTA methods with memory models
Abstract: Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to *any* unseen task in an environment after an offline, reward-free pre-training phase. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is often only *partially observable*. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our *memory-based* zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our anonymised code is available via: https://anonymous.4open.science/r/rlc2025/.
Submission Number: 245
Loading